当前位置: 首页 > news >正文

网站你啦怎样做旺仔饼干网站建设有几种方法

网站你啦怎样做旺仔饼干,网站建设有几种方法,网站购物车功能怎么做,网站建设需要匹配人员测试环境#xff1a;hive on spark spark版本#xff1a;3.3.1 一、执行时机二、对结果集的影响三、效率对比1.内连接1#xff09;on2#xff09;where 2.外连接1#xff09;on2#xff09;where 四、总结PS 一、执行时机 sql连接中#xff0c;where属于过滤条件#… 测试环境hive on spark spark版本3.3.1 一、执行时机二、对结果集的影响三、效率对比1.内连接1on2where 2.外连接1on2where 四、总结PS 一、执行时机 sql连接中where属于过滤条件用于对join的结果集进行过滤所以理论上的执行时机在join之后。on属于关联条件决定了满足什么样条件的数据才可以被关联到一起因此理论上的执行时机在join时。 但是大多数数据库系统为了提升效率都采用了一些优化技术思想都是将where中的筛选条件或是on中的关联条件尽可能的提前到数据源侧进行筛选目的是减少参与关联的数据量。因此它们实际的执行时机大多时候和理论上的不同。 二、对结果集的影响 内连接中条件放在where或者on中对结果集无影响。 外连接中以左外连接为例因为左外连接是完全保留左表记录on在join时生效因此最终的结果集也会保留左表的全部记录。where是对join后的结果集进行操作所以会过滤掉一些数据导致二者的结果集不相同。 三、效率对比 测试数据量如下 poi_data.poi_res表数据量8300W bi_report.mon_ronghe_pv表分区表总数据量120E本次采用分区20240522的数据关联数据量5900W其中 bid like ‘1%’ pv100 的数据量120W 两表的关联字段均无重复值。 1.内连接 1on selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv from poi_data.poi_res t1 join (select bid, pv from bi_report.mon_ronghe_pv where event_day20240522) t2 on t1.bidt2.bid and t2.bid like 1% and t2.pv100;Physical Plan AdaptiveSparkPlan (28) - Final Plan CollectLimit (17)- * Project (16)- * SortMergeJoin Inner (15):- * Sort (6): - AQEShuffleRead (5): - ShuffleQueryStage (4), Statistics(sizeInBytes5.3 GiB, rowCount4.57E7): - Exchange (3): - * Filter (2): - Scan hive poi_data.poi_res (1)- * Sort (14)- AQEShuffleRead (13)- ShuffleQueryStage (12), Statistics(sizeInBytes58.5 MiB, rowCount1.28E6)- Exchange (11)- * Project (10)- * Filter (9)- * ColumnarToRow (8)- Scan parquet bi_report.mon_ronghe_pv (7) - Initial Plan CollectLimit (27)- Project (26)- SortMergeJoin Inner (25):- Sort (20): - Exchange (19): - Filter (18): - Scan hive poi_data.poi_res (1)- Sort (24)- Exchange (23)- Project (22)- Filter (21)- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res Output [4]: [bid#297, name#299, point_x#316, point_y#317] Arguments: [bid#297, name#299, point_x#316, point_y#317], HiveTableRelation [poi_data.poi_res, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#297, type#298, name#299, address#300, phone#301, alias#302, post_code#303, catalog_id#304, c..., Partition Cols: []](2) Filter [codegen id : 1] Input [4]: [bid#297, name#299, point_x#316, point_y#317] Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297))(3) Exchange Input [4]: [bid#297, name#299, point_x#316, point_y#317] Arguments: hashpartitioning(bid#297, 600), ENSURE_REQUIREMENTS, [plan_id774](4) ShuffleQueryStage Output [4]: [bid#297, name#299, point_x#316, point_y#317] Arguments: 0(5) AQEShuffleRead Input [4]: [bid#297, name#299, point_x#316, point_y#317] Arguments: coalesced(6) Sort [codegen id : 3] Input [4]: [bid#297, name#299, point_x#316, point_y#317] Arguments: [bid#297 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv Output [3]: [bid#334, pv#335, event_day#338] Batched: true Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day20240522] PartitionFilters: [isnotnull(event_day#338), (event_day#338 20240522)] PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)] ReadSchema: structbid:string,pv:int(8) ColumnarToRow [codegen id : 2] Input [3]: [bid#334, pv#335, event_day#338](9) Filter [codegen id : 2] Input [3]: [bid#334, pv#335, event_day#338] Condition : (((isnotnull(bid#334) AND isnotnull(pv#335)) AND StartsWith(bid#334, 1)) AND (pv#335 100))(10) Project [codegen id : 2] Output [2]: [bid#334, pv#335] Input [3]: [bid#334, pv#335, event_day#338](11) Exchange Input [2]: [bid#334, pv#335] Arguments: hashpartitioning(bid#334, 600), ENSURE_REQUIREMENTS, [plan_id799](12) ShuffleQueryStage Output [2]: [bid#334, pv#335] Arguments: 1(13) AQEShuffleRead Input [2]: [bid#334, pv#335] Arguments: coalesced(14) Sort [codegen id : 4] Input [2]: [bid#334, pv#335] Arguments: [bid#334 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5] Left keys [1]: [bid#297] Right keys [1]: [bid#334] Join condition: None(16) Project [codegen id : 5] Output [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335] Input [6]: [bid#297, name#299, point_x#316, point_y#317, bid#334, pv#335](17) CollectLimit Input [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335] Arguments: 1000(18) Filter Input [4]: [bid#297, name#299, point_x#316, point_y#317] Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297))(19) Exchange Input [4]: [bid#297, name#299, point_x#316, point_y#317] Arguments: hashpartitioning(bid#297, 600), ENSURE_REQUIREMENTS, [plan_id759](20) Sort Input [4]: [bid#297, name#299, point_x#316, point_y#317] Arguments: [bid#297 ASC NULLS FIRST], false, 0(21) Filter Input [3]: [bid#334, pv#335, event_day#338] Condition : (((isnotnull(bid#334) AND isnotnull(pv#335)) AND StartsWith(bid#334, 1)) AND (pv#335 100))(22) Project Output [2]: [bid#334, pv#335] Input [3]: [bid#334, pv#335, event_day#338](23) Exchange Input [2]: [bid#334, pv#335] Arguments: hashpartitioning(bid#334, 600), ENSURE_REQUIREMENTS, [plan_id760](24) Sort Input [2]: [bid#334, pv#335] Arguments: [bid#334 ASC NULLS FIRST], false, 0(25) SortMergeJoin Left keys [1]: [bid#297] Right keys [1]: [bid#334] Join condition: None(26) Project Output [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335] Input [6]: [bid#297, name#299, point_x#316, point_y#317, bid#334, pv#335](27) CollectLimit Input [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335] Arguments: 1000(28) AdaptiveSparkPlan Output [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335] Arguments: isFinalPlantrue从物理执行计划可以看到第2步中的Filter使用条件Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297))在t1表读取源数据时进行了过滤在第7步中通过谓词下推在t2表scan源数据时使用条件PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)]进行了过滤两表都是在数据源侧进行的数据过滤减少了shuffle和参与join的数据量。 2where selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv from poi_data.poi_res t1 join (select bid, pv from bi_report.mon_ronghe_pv where event_day20240522) t2 on t1.bidt2.bid where t2.bid like 1% and t2.pv100;Physical Plan AdaptiveSparkPlan (28) - Final Plan CollectLimit (17)- * Project (16)- * SortMergeJoin Inner (15):- * Sort (6): - AQEShuffleRead (5): - ShuffleQueryStage (4), Statistics(sizeInBytes5.3 GiB, rowCount4.57E7): - Exchange (3): - * Filter (2): - Scan hive poi_data.poi_res (1)- * Sort (14)- AQEShuffleRead (13)- ShuffleQueryStage (12), Statistics(sizeInBytes58.5 MiB, rowCount1.28E6)- Exchange (11)- * Project (10)- * Filter (9)- * ColumnarToRow (8)- Scan parquet bi_report.mon_ronghe_pv (7) - Initial Plan CollectLimit (27)- Project (26)- SortMergeJoin Inner (25):- Sort (20): - Exchange (19): - Filter (18): - Scan hive poi_data.poi_res (1)- Sort (24)- Exchange (23)- Project (22)- Filter (21)- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res Output [4]: [bid#350, name#352, point_x#369, point_y#370] Arguments: [bid#350, name#352, point_x#369, point_y#370], HiveTableRelation [poi_data.poi_res, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#350, type#351, name#352, address#353, phone#354, alias#355, post_code#356, catalog_id#357, c..., Partition Cols: []](2) Filter [codegen id : 1] Input [4]: [bid#350, name#352, point_x#369, point_y#370] Condition : (StartsWith(bid#350, 1) AND isnotnull(bid#350))(3) Exchange Input [4]: [bid#350, name#352, point_x#369, point_y#370] Arguments: hashpartitioning(bid#350, 600), ENSURE_REQUIREMENTS, [plan_id908](4) ShuffleQueryStage Output [4]: [bid#350, name#352, point_x#369, point_y#370] Arguments: 0(5) AQEShuffleRead Input [4]: [bid#350, name#352, point_x#369, point_y#370] Arguments: coalesced(6) Sort [codegen id : 3] Input [4]: [bid#350, name#352, point_x#369, point_y#370] Arguments: [bid#350 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv Output [3]: [bid#387, pv#388, event_day#391] Batched: true Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day20240522] PartitionFilters: [isnotnull(event_day#391), (event_day#391 20240522)] PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)] ReadSchema: structbid:string,pv:int(8) ColumnarToRow [codegen id : 2] Input [3]: [bid#387, pv#388, event_day#391](9) Filter [codegen id : 2] Input [3]: [bid#387, pv#388, event_day#391] Condition : (((isnotnull(bid#387) AND isnotnull(pv#388)) AND StartsWith(bid#387, 1)) AND (pv#388 100))(10) Project [codegen id : 2] Output [2]: [bid#387, pv#388] Input [3]: [bid#387, pv#388, event_day#391](11) Exchange Input [2]: [bid#387, pv#388] Arguments: hashpartitioning(bid#387, 600), ENSURE_REQUIREMENTS, [plan_id933](12) ShuffleQueryStage Output [2]: [bid#387, pv#388] Arguments: 1(13) AQEShuffleRead Input [2]: [bid#387, pv#388] Arguments: coalesced(14) Sort [codegen id : 4] Input [2]: [bid#387, pv#388] Arguments: [bid#387 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5] Left keys [1]: [bid#350] Right keys [1]: [bid#387] Join condition: None(16) Project [codegen id : 5] Output [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388] Input [6]: [bid#350, name#352, point_x#369, point_y#370, bid#387, pv#388](17) CollectLimit Input [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388] Arguments: 1000(18) Filter Input [4]: [bid#350, name#352, point_x#369, point_y#370] Condition : (StartsWith(bid#350, 1) AND isnotnull(bid#350))(19) Exchange Input [4]: [bid#350, name#352, point_x#369, point_y#370] Arguments: hashpartitioning(bid#350, 600), ENSURE_REQUIREMENTS, [plan_id893](20) Sort Input [4]: [bid#350, name#352, point_x#369, point_y#370] Arguments: [bid#350 ASC NULLS FIRST], false, 0(21) Filter Input [3]: [bid#387, pv#388, event_day#391] Condition : (((isnotnull(bid#387) AND isnotnull(pv#388)) AND StartsWith(bid#387, 1)) AND (pv#388 100))(22) Project Output [2]: [bid#387, pv#388] Input [3]: [bid#387, pv#388, event_day#391](23) Exchange Input [2]: [bid#387, pv#388] Arguments: hashpartitioning(bid#387, 600), ENSURE_REQUIREMENTS, [plan_id894](24) Sort Input [2]: [bid#387, pv#388] Arguments: [bid#387 ASC NULLS FIRST], false, 0(25) SortMergeJoin Left keys [1]: [bid#350] Right keys [1]: [bid#387] Join condition: None(26) Project Output [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388] Input [6]: [bid#350, name#352, point_x#369, point_y#370, bid#387, pv#388](27) CollectLimit Input [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388] Arguments: 1000(28) AdaptiveSparkPlan Output [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388] Arguments: isFinalPlantrue物理执行计划没有变化因此可以说当数据库支持谓词下推时筛选条件用where还是on没有区别数据库都会在数据源侧进行数据过滤减少参与关联的数据量。 2.外连接 1on selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv from poi_data.poi_res t1 left join (select bid, pv from bi_report.mon_ronghe_pv where event_day20240522) t2 on t1.bidt2.bid and t2.bid like 1% and t2.pv100;Physical Plan AdaptiveSparkPlan (28) - Final Plan CollectLimit (17)- * Project (16)- * SortMergeJoin LeftOuter (15):- * Sort (6): - AQEShuffleRead (5): - ShuffleQueryStage (4), Statistics(sizeInBytes36.5 MiB, rowCount3.07E5): - Exchange (3): - * LocalLimit (2): - Scan hive poi_data.poi_res (1)- * Sort (14)- AQEShuffleRead (13)- ShuffleQueryStage (12), Statistics(sizeInBytes58.5 MiB, rowCount1.28E6)- Exchange (11)- * Project (10)- * Filter (9)- * ColumnarToRow (8)- Scan parquet bi_report.mon_ronghe_pv (7) - Initial Plan CollectLimit (27)- Project (26)- SortMergeJoin LeftOuter (25):- Sort (20): - Exchange (19): - LocalLimit (18): - Scan hive poi_data.poi_res (1)- Sort (24)- Exchange (23)- Project (22)- Filter (21)- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res Output [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: [bid#403, name#405, point_x#422, point_y#423], HiveTableRelation [poi_data.poi_res, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#403, type#404, name#405, address#406, phone#407, alias#408, post_code#409, catalog_id#410, c..., Partition Cols: []](2) LocalLimit [codegen id : 1] Input [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: 1000(3) Exchange Input [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: hashpartitioning(bid#403, 600), ENSURE_REQUIREMENTS, [plan_id1043](4) ShuffleQueryStage Output [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: 0(5) AQEShuffleRead Input [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: coalesced(6) Sort [codegen id : 3] Input [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: [bid#403 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv Output [3]: [bid#440, pv#441, event_day#444] Batched: true Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day20240522] PartitionFilters: [isnotnull(event_day#444), (event_day#444 20240522)] PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)] ReadSchema: structbid:string,pv:int(8) ColumnarToRow [codegen id : 2] Input [3]: [bid#440, pv#441, event_day#444](9) Filter [codegen id : 2] Input [3]: [bid#440, pv#441, event_day#444] Condition : (((isnotnull(bid#440) AND isnotnull(pv#441)) AND StartsWith(bid#440, 1)) AND (pv#441 100))(10) Project [codegen id : 2] Output [2]: [bid#440, pv#441] Input [3]: [bid#440, pv#441, event_day#444](11) Exchange Input [2]: [bid#440, pv#441] Arguments: hashpartitioning(bid#440, 600), ENSURE_REQUIREMENTS, [plan_id1067](12) ShuffleQueryStage Output [2]: [bid#440, pv#441] Arguments: 1(13) AQEShuffleRead Input [2]: [bid#440, pv#441] Arguments: coalesced(14) Sort [codegen id : 4] Input [2]: [bid#440, pv#441] Arguments: [bid#440 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5] Left keys [1]: [bid#403] Right keys [1]: [bid#440] Join condition: None(16) Project [codegen id : 5] Output [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441] Input [6]: [bid#403, name#405, point_x#422, point_y#423, bid#440, pv#441](17) CollectLimit Input [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441] Arguments: 1000(18) LocalLimit Input [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: 1000(19) Exchange Input [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: hashpartitioning(bid#403, 600), ENSURE_REQUIREMENTS, [plan_id1029](20) Sort Input [4]: [bid#403, name#405, point_x#422, point_y#423] Arguments: [bid#403 ASC NULLS FIRST], false, 0(21) Filter Input [3]: [bid#440, pv#441, event_day#444] Condition : (((isnotnull(bid#440) AND isnotnull(pv#441)) AND StartsWith(bid#440, 1)) AND (pv#441 100))(22) Project Output [2]: [bid#440, pv#441] Input [3]: [bid#440, pv#441, event_day#444](23) Exchange Input [2]: [bid#440, pv#441] Arguments: hashpartitioning(bid#440, 600), ENSURE_REQUIREMENTS, [plan_id1030](24) Sort Input [2]: [bid#440, pv#441] Arguments: [bid#440 ASC NULLS FIRST], false, 0(25) SortMergeJoin Left keys [1]: [bid#403] Right keys [1]: [bid#440] Join condition: None(26) Project Output [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441] Input [6]: [bid#403, name#405, point_x#422, point_y#423, bid#440, pv#441](27) CollectLimit Input [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441] Arguments: 1000(28) AdaptiveSparkPlan Output [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441] Arguments: isFinalPlantrue因为左关联on中的条件属于连接条件结果需要保留左表全部记录所以t1表全量读取t2表使用了谓词下推过滤。 2where selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv from poi_data.poi_res t1 left join (select bid, pv from bi_report.mon_ronghe_pv where event_day20240522) t2 on t1.bidt2.bid where t2.bid like 1% and t2.pv100;Physical Plan AdaptiveSparkPlan (28) - Final Plan CollectLimit (17)- * Project (16)- * SortMergeJoin Inner (15):- * Sort (6): - AQEShuffleRead (5): - ShuffleQueryStage (4), Statistics(sizeInBytes5.3 GiB, rowCount4.57E7): - Exchange (3): - * Filter (2): - Scan hive poi_data.poi_res (1)- * Sort (14)- AQEShuffleRead (13)- ShuffleQueryStage (12), Statistics(sizeInBytes58.5 MiB, rowCount1.28E6)- Exchange (11)- * Project (10)- * Filter (9)- * ColumnarToRow (8)- Scan parquet bi_report.mon_ronghe_pv (7) - Initial Plan CollectLimit (27)- Project (26)- SortMergeJoin Inner (25):- Sort (20): - Exchange (19): - Filter (18): - Scan hive poi_data.poi_res (1)- Sort (24)- Exchange (23)- Project (22)- Filter (21)- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res Output [4]: [bid#456, name#458, point_x#475, point_y#476] Arguments: [bid#456, name#458, point_x#475, point_y#476], HiveTableRelation [poi_data.poi_res, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#456, type#457, name#458, address#459, phone#460, alias#461, post_code#462, catalog_id#463, c..., Partition Cols: []](2) Filter [codegen id : 1] Input [4]: [bid#456, name#458, point_x#475, point_y#476] Condition : (StartsWith(bid#456, 1) AND isnotnull(bid#456))(3) Exchange Input [4]: [bid#456, name#458, point_x#475, point_y#476] Arguments: hashpartitioning(bid#456, 600), ENSURE_REQUIREMENTS, [plan_id1176](4) ShuffleQueryStage Output [4]: [bid#456, name#458, point_x#475, point_y#476] Arguments: 0(5) AQEShuffleRead Input [4]: [bid#456, name#458, point_x#475, point_y#476] Arguments: coalesced(6) Sort [codegen id : 3] Input [4]: [bid#456, name#458, point_x#475, point_y#476] Arguments: [bid#456 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv Output [3]: [bid#493, pv#494, event_day#497] Batched: true Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day20240522] PartitionFilters: [isnotnull(event_day#497), (event_day#497 20240522)] PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)] ReadSchema: structbid:string,pv:int(8) ColumnarToRow [codegen id : 2] Input [3]: [bid#493, pv#494, event_day#497](9) Filter [codegen id : 2] Input [3]: [bid#493, pv#494, event_day#497] Condition : (((isnotnull(bid#493) AND isnotnull(pv#494)) AND StartsWith(bid#493, 1)) AND (pv#494 100))(10) Project [codegen id : 2] Output [2]: [bid#493, pv#494] Input [3]: [bid#493, pv#494, event_day#497](11) Exchange Input [2]: [bid#493, pv#494] Arguments: hashpartitioning(bid#493, 600), ENSURE_REQUIREMENTS, [plan_id1201](12) ShuffleQueryStage Output [2]: [bid#493, pv#494] Arguments: 1(13) AQEShuffleRead Input [2]: [bid#493, pv#494] Arguments: coalesced(14) Sort [codegen id : 4] Input [2]: [bid#493, pv#494] Arguments: [bid#493 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5] Left keys [1]: [bid#456] Right keys [1]: [bid#493] Join condition: None(16) Project [codegen id : 5] Output [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494] Input [6]: [bid#456, name#458, point_x#475, point_y#476, bid#493, pv#494](17) CollectLimit Input [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494] Arguments: 1000(18) Filter Input [4]: [bid#456, name#458, point_x#475, point_y#476] Condition : (StartsWith(bid#456, 1) AND isnotnull(bid#456))(19) Exchange Input [4]: [bid#456, name#458, point_x#475, point_y#476] Arguments: hashpartitioning(bid#456, 600), ENSURE_REQUIREMENTS, [plan_id1161](20) Sort Input [4]: [bid#456, name#458, point_x#475, point_y#476] Arguments: [bid#456 ASC NULLS FIRST], false, 0(21) Filter Input [3]: [bid#493, pv#494, event_day#497] Condition : (((isnotnull(bid#493) AND isnotnull(pv#494)) AND StartsWith(bid#493, 1)) AND (pv#494 100))(22) Project Output [2]: [bid#493, pv#494] Input [3]: [bid#493, pv#494, event_day#497](23) Exchange Input [2]: [bid#493, pv#494] Arguments: hashpartitioning(bid#493, 600), ENSURE_REQUIREMENTS, [plan_id1162](24) Sort Input [2]: [bid#493, pv#494] Arguments: [bid#493 ASC NULLS FIRST], false, 0(25) SortMergeJoin Left keys [1]: [bid#456] Right keys [1]: [bid#493] Join condition: None(26) Project Output [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494] Input [6]: [bid#456, name#458, point_x#475, point_y#476, bid#493, pv#494](27) CollectLimit Input [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494] Arguments: 1000(28) AdaptiveSparkPlan Output [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494] Arguments: isFinalPlantruewhere属于过滤条件影响左关联的最终结果所以执行计划第2步中将where提前到join关联之前按照bid对t1表进行过滤。 四、总结 假设数据库系统支持谓词下推的前提下 内连接内连接的两个执行计划中对t2表都使用了PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)]对t1表都使用了Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297)) 因此可以说内连接中where和on在执行效率上没区别。外连接还是拿左外连接来说右表相关的条件会使用谓词下推而左表是否会提前过滤数据取决于where还是on以及筛选条件是否与左表相关1当为on时左表的数据必须全量读取此时效率的差别主要取决于左表的数据量。2当为where时如果筛选条件涉及到左表则会进行数据的提前过滤否则左表仍然全量读取。 PS 在内连接的物理执行计划中对poi_res表的过滤单独作为一个Filter步骤2Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297))而对mon_ronghe_pv表的过滤在第7步scan中PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)] 二者有什么区别查了一些资料说的是可以将PushedFilters理解为在读取数据时的过滤不满足条件的数据直接不读取。Filter时将数据读取之后再判断是否满足条件决定是否参与后续计算。 既然都是在数据源侧进行数据过滤为什么Filter不能像PushedFilters那样直接在读取数据的时候判断减少读入的数据量呢这样也可以提升效率这是一开始个人的疑问。查了一些资料说的是是否支持在scan时filter数据主要受数据源的影响。大数据中的存储方式主要分为行式存储和列式存储列式存储的数据存储方式和丰富的元数据对谓词下推技术有更好的支持。当前测试中mon_ronghe_pv表的存储格式为parquetpoi_res表存储格式text。
http://www.dnsts.com.cn/news/93330.html

相关文章:

  • 设计好看的企业网站嘉兴做网站建设
  • 网站源码下载地址是什么玛沁县网站建设公司
  • 长沙微网站制作wordpress中文官网
  • 轻淘客cms建站教程上海营销策划公司哪家好
  • 西安企业建站价格不关闭网站备案
  • 建设工程质量安全管理协会网站设计签名的软件
  • 住房城乡建设局网站首页意派网站开发新手篇
  • 番禺网站制作设计2017网站开发兼职
  • 你在四川省建设安全与质量监督网站新闻稿生成器app
  • 崇州市建设局网站网站建设硬件计划
  • 长安公司网站建设课程网站建设所用技术
  • 注册网站需要什么条件三亚做网站哪家好
  • 给公司做的东西放到自己网站上食品网站模板下载
  • 移动端手机网站建设海外贸易在什么网站做
  • 微信wap网站wordpress更改域名修改站内链接
  • 上海高端网站定制凡客v网上商城
  • 端 传媒网站模板种植类网站模板
  • 网站平台建设服务合同找建设企业网站公司
  • 开源的 二次网站开发学校网站推广
  • 白云高端网站建设案例网页制作和网页制作技术
  • 企业网站怎么做才好开发公司对代理公司管理
  • 建设企业网站小微什么是软件开发过程
  • 慈溪网站建设公司网站再就业技能培训班
  • 网站建设 苏州跨境电商入门基础知识
  • iis7发布网站教程泉州网络推广专员
  • 怎么样建一个网站wordpress 会员注册插件
  • 厦门淘宝网站设计公司泰安人才网网上办事
  • 做网站php与python怎样做 网站做seo
  • 北京网站开发招聘58平板网站开发环境
  • 免费网站重生做军嫂怎样购买起名软件自己做网站