
拆老姐机子有感
主板
1.atx matx itx

老姐显然用的使ATX
显卡
不懂但是是
NVIDIA GeForce GTX 970
15年左右的神卡
运行内存
两张8GB=16GB

CPU
不懂但是使16年的i7-4790 3.60Ghz
散热器
不懂,但是是立式的扎手散热器
最后

重装系统
1.https://www.microsoft.com/zh-cn/software-download/windows10/下载
2.下载到除了桌面的文件里面
3.双击点开
4.加载入u盘里(tmd会格式化)
5.插入(新)电脑
6.开机狂按F2/F1/del进入bios
7.选择优先加载u盘
8.开机设置
9.淘宝买了激活码,现在还没解决激活问题,老爸那边没有网线了,等待无线网卡到货
六、暑假作业
1 | create database ods_edu; |
1 | create database app_edu_chat; |
mysql导入hive
1 | sqoop import \ --connect jdbc:mysql://node1:3306/nev \ |
五、一些完成的sql
1 | -- 1.数据仓库构建-- 1.1创建ods库create database if not exists ods_didi;-- 1.2创建dw数据库create database if not exists dw_didi;-- 1.3创建app数据库create database if not exists app_didi;use ods_didi;-- 2.在ods层创建表-- 2.1创建订单结构表-- 创建用户订单表结构create table if not exists ods_didi.t_user_order( orderId string comment '订单id', telephone string comment '打车用户手机', lng string comment '用户发起打车的经度', lat string comment '用户发起打车的纬度', province string comment '所在省份', city string comment '所在城市', es_money double comment '预估打车费用', gender string comment '用户信息 - 性别', profession string comment '用户信息 - 行业', age_range string comment '年龄段(70后、80后、...)', tip double comment '小费', subscribe int comment '是否预约(0 - 非预约、1 - 预约)', sub_time string comment '预约时间', is_agent int comment '是否代叫(0 - 本人、1 - 代叫)', agent_telephone string comment '预约人手机', order_time string comment '订单时间') partitioned by (dt string comment '时间分区') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';-- ods创建取消订单表create table if not exists ods_didi.t_user_cancel_order( orderId string comment '订单ID', cstm_telephone string comment '客户联系电话', lng string comment '取消订单的经度', lat string comment '取消订单的纬度', province string comment '所在省份', city string comment '所在城市', es_distance double comment '预估距离', gender string comment '性别', profession string comment '行业', age_range string comment '年龄段', reason int comment '取消订单原因(1 - 选择了其他交通方式、2 - 与司机达成一致,取消订单、3 - 投诉司机没来接我、4 - 已不需要用车、5 - 无理由取消订单)', cancel_time string comment '取消时间') partitioned by (dt string comment '时间分区') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';-- ods创建订单表支付表create table if not exists ods_didi.t_user_pay_order( id string comment '支付订单ID', orderId string comment '订单ID', lng string comment '目的地的经度(支付地址)', lat string comment '目的地的纬度(支付地址)', province string comment '省份', city string comment '城市', total_money double comment '车费总价', real_pay_money double comment '实际支付总额', passenger_additional_money double comment '乘客额外加价', base_money double comment '车费合计', has_coupon int comment '是否使用优惠券(0 - 不使用、1 - 使用)', coupon_total double comment '优惠券合计', pay_way int comment '支付方式(0 - 微信支付、1 - 支付宝支付、3 - QQ钱包支付、4 - 一网通银行卡支付)', mileage double comment '里程(单位公里)', pay_time string comment '支付时间') partitioned by (dt string comment '时间分区') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';-- ods创建用户评价表create table if not exists ods_didi.t_user_evaluate( id string comment '评价日志唯一ID', orderId string comment '订单ID', passenger_telephone string comment '用户电话', passenger_province string comment '用户所在省份', passenger_city string comment '用户所在城市', eva_level int comment '评价等级(1 - 一颗星、... 5 - 五星)', eva_time string comment '评价时间') partitioned by (dt string comment '时间分区') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';-- 创建数据仓库 导入数据load data local inpath '/export/data/didi/order.csv' into table ods_didi.t_user_order partition (dt = '2020-04-12');load data local inpath '/export/data/didi/cancel_order.csv' into table ods_didi.t_user_cancel_order partition (dt = '2020-04-12');load data local inpath '/export/data/didi/pay.csv' into table ods_didi.t_user_pay_order partition (dt = '2020-04-12');load data local inpath '/export/data/didi/evaluate.csv' into table ods_didi.t_user_evaluate partition (dt = '2020-04-12');-- truncate table dw_didi.t_user_pay_order;-- 3.在dw层进行数据预处理use dw_didi;-- 创建宽表语句create table if not exists dw_didi.t_user_order_wide( orderId string comment '订单id', telephone string comment '打车用户手机', lng string comment '用户发起打车的经度', lat string comment '用户发起打车的纬度', province string comment '所在省份', city string comment '所在城市', es_money double comment '预估打车费用', gender string comment '用户信息 - 性别', profession string comment '用户信息 - 行业', age_range string comment '年龄段(70后、80后、...)', tip double comment '小费', subscribe int comment '是否预约(0 - 非预约、1 - 预约)', subscribe_name string comment '是否预约名称', sub_time string comment '预约时间', is_agent int comment '是否代叫(0 - 本人、1 - 代叫)', is_agent_name string comment '是否代缴名称', agent_telephone string comment '预约人手机', order_time string comment '订单时间', order_date string comment '订单时间,yyyy-MM-dd', order_year string comment '年', order_month string comment '月', order_day string comment '日', order_hour string comment '小时', order_time_range string comment '时间段') partitioned by (dt string comment '2020-04-12') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';-- 预处理sql语句 用户订单处理insert overwrite table dw_didi.t_user_order_wide partition (dt = '2020-04-12')select orderid, telephone, lng, lat, province, city, es_money, gender, profession, age_range, tip, subscribe,-- if(nvl(subscribe, 0) = 0, '非预约', '预约') as subscribe_name, case when subscribe = 0 or (subscribe is null) then '非预约' when subscribe = 1 then '预约' end as subscribe_name, date_format(sub_time, 'yyyy-MM-dd') as sub_time, is_agent, case when is_agent = 0 or (subscribe is null) then '本人' when is_agent = 1 then '代叫' end as is_agent_name, agent_telephone,-- substr(order_time, 1, 4) as year, date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss') as order_time, date_format(order_time, 'yyyy-MM-dd') as order_data, year(date_format(order_time, 'yyyy-MM-dd')) as order_year, month(date_format(order_time, 'yyyy-MM-dd')) as order_month, day(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) as order_day, hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) as order_hour, case when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 1 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 5 then '凌晨' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 5 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 8 then '早上' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 8 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 11 then '上午' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 11 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 13 then '中午' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 13 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 17 then '下午' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 17 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 19 then '晚上' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 19 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 20 then '半夜' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 20 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 24 then '深夜' when hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) >= 0 and hour(date_format(concat(order_time, ':00'), 'yyyy-MM-dd HH:mm:ss')) < 1 then '凌晨' end-- date_format(order_time, 'yyyy-MM-dd HH:mm:ss'),from ods_didi.t_user_orderwhere length(order_time) >= 8 and dt = '2020-04-12';create table if not exists dw_didi.t_user_cancel_order( orderId string, Profession string, age_range string, Reason string, cancel_time string) partitioned by (dt string comment '2020-04-12') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';insert overwrite table dw_didi.t_user_cancel_order partition (dt = '2020-04-12')select orderId, profession, age_range, reason, cancel_timefrom ods_didi.t_user_cancel_orderwhere dt = '2020-04-12';create table if not exists dw_didi.t_user_pay_order( id string comment '支付订单ID', orderId string comment '订单ID', real_pay_money double comment '实际支付总额', has_coupon int comment '是否使用优惠券(0 - 不使用、1 - 使用)', pay_way int comment '支付方式(0-微信支付、1-支付宝支付、3-QQ钱包支付、4- 一网通银行卡支付)', mileage double comment '里程(单位公里)', pay_time string comment '支付时间') partitioned by (dt string comment '2020-04-12') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';insert overwrite table dw_didi.t_user_pay_order partition (dt = '2020-04-12')select id, orderId, real_pay_money, has_coupon, pay_way, mileage, pay_timefrom ods_didi.t_user_pay_orderwhere dt = '2020-04-12';create table if not exists dw_didi.t_user_evaluate( id string comment '评价日志唯一ID', orderId string comment '订单ID', eva_level int comment '评价等级(1 - 一颗星、... 5 - 五星)', eva_time string comment '评价时间') partitioned by (dt string comment '时间分区') ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';insert overwrite table dw_didi.t_user_evaluate partition (dt = '2020-04-12')select id, orderId, eva_level, eva_timefrom ods_didi.t_user_evaluatewhere dt = '2020-04-12';-- 4.数据处理-- 4.1 总订单笔数分析-- 4.1.1计算4.12的总订单笔数分析select max(dt) as `时间`, count(orderid) `订单总笔数`from dw_didi.t_user_order_widewhere dt = '2020-04-12';--建表create table if not exists app_didi.t_order_total( date_val string comment '日期(yyyy-MM-dd)', count int comment '订单笔数') partitioned by (month string comment '按月分区yyyy-MM') row format delimited fields terminated by ',';--加载数据据insert into table app_didi.t_order_total partition (month = '2020-04')select max(dt) as `时间`, count(orderid) as `订单时间`from dw_didi.t_user_order_widewhere dt = '2020-04-12';-- truncate table app_didi.t_order_total;-- 4.2 预约订单/非预约订单占比分析-- sum,avg,max,min-- 预约单/总单*100%select count(*) as cnt_totalfrom dw_didi.t_user_order_widewhere dt = '2020-04-12';select max(order_date) as `日期`, subscribe_name as `是否预约`, count(subscribe_name) as cntfrom dw_didi.t_user_order_widewhere dt = '2020-04-12'group by subscribe_name;--左连接select *from (select max(order_date) as `日期`, subscribe_name as `是否预约`, count(subscribe_name) as cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by subscribe_name) t1 left join (select count(*) as cnt_total from dw_didi.t_user_order_wide where dt = '2020-04-12') t2;-- 隐式内连接select `日期`, `是否预约`, concat(round(cnt / cnt_total * 100, 2), '%') as `百分比`from (select max(order_date) as `日期`, subscribe_name as `是否预约`, count(subscribe_name) as cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by subscribe_name) t1, (select count(*) as cnt_total from dw_didi.t_user_order_wide where dt = '2020-04-12') t2;--开窗函数select order_date as `日期`, subscribe_name as `是否预约`, count(subscribe_name) over (partition by subscribe_name) cnt, count() over () cnt_totalfrom dw_didi.t_user_order_widewhere dt = '2020-04-12';-- group by subscribe_name;--开窗函数2select max(`日期`) as `日期`, `是否预约`, concat(round(max(cnt) / max(cnt_total) * 100, 2), '%') `百分比`from (select order_date as `日期`, subscribe_name as `是否预约`, count(subscribe_name) over (partition by subscribe_name) cnt, count() over () cnt_total from dw_didi.t_user_order_wide where dt = '2020-04-12') tgroup by `是否预约`;--方法三select `日期`, `是否预约`, concat(round((cnt / sum(cnt) over ()) * 100, 2), '%') `百分比`from (select max(order_date) as `日期`, subscribe_name as `是否预约`, count(subscribe_name) as cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by subscribe_name) t;create table if not exists app_didi.t_order_subscribe_percent( date_val string comment '日期', subscribe_name string comment '是否预约', percent_val string comment '百分比') partitioned by (month string comment '年月yyyy-MM') row format delimited fields terminated by ',';select `日期`, `是否预约`, cnt / sum(cnt) over () * 100from (select max(order_date) as `日期`, subscribe_name as `是否预约`, count(subscribe_name) as cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by subscribe_name) t;insert overwrite table app_didi.t_order_subscribe_percent partition (month = '2020-04')select `日期`, `是否预约`, concat(round((cnt / sum(cnt) over ()) * 100, 2), '%') `百分比`from (select max(order_date) as `日期`, subscribe_name as `是否预约`, count(subscribe_name) as cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by subscribe_name) t;-- 4.3不同时段订单的个数create table if not exists app_didi.t_order_timerange_total( datetime string comment '日期', timerange string comment '时间段', count int comment '订单数量') partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';--sqlselect max(dt), order_time_range, count(*) as order_cntfrom dw_didi.t_user_order_widewhere dt = '2020-04-12'group by order_time_range;--加载数据insert overwrite table app_didi.t_order_timerange_total partition (month = '2020-04')select max(dt), order_time_range, count(*) as order_cntfrom dw_didi.t_user_order_widewhere dt = '2020-04-12'group by order_time_range;--4.4不同年龄段、时段订单个数select max(dt), age_range, order_time_range, count(*) as order_cntfrom dw_didi.t_user_order_widewhere dt = '2020-04-12'group by age_range, order_time_range;create table if not exists app_didi.t_order_age_and_time_range_total( datetime string comment '日期', age_range string comment '年龄段', order_time_range string comment '时段', count int comment '订单数量') partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';insert overwrite table app_didi.t_order_age_and_time_range_total partition (month = '2020-04')select max(dt), age_range, order_time_range, count(*) as order_cntfrom dw_didi.t_user_order_widewhere dt = '2020-04-12'group by age_range, order_time_range;--4.4不同地域订单个数select province, count(*) as order_cntfrom dw_didi.t_user_order_widewhere dt = '2020-04-12'group by province;--建表create table if not exists app_didi.t_order_province_total( datetime string comment '日期', province string comment '省份', count int comment '订单数量') partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';insert overwrite table app_didi.t_order_province_total partition (month = '2020-04')select '2020-04-12', province, count(*) as order_cntfrom dw_didi.t_user_order_widewhere dt = '2020-04-12'group by province;-- 4.5求订单客户职业排名top5-- 第一步 :按职业分组求客户数量select max(dt), profession, count(orderId)from dw_didi.t_user_order_widewhere dt = '2020-04-12'group by profession;-- 第二部 排名select dt1, profession, cnt, row_number() over (order by cnt desc )from (select max(dt) as dt1, profession, count(orderId) cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by profession) t;-- 取前五select *from (select dt1, profession, cnt, row_number() over (order by cnt desc ) as rk from (select max(dt) as dt1, profession, count(orderId) cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by profession) t1) t2where rk <= 5;with t1 as (select max(dt) dt1, profession, count(orderId) cnt from dw_didi.t_user_order_wide where dt = '2020-04-12' group by profession), t2 as (select dt1,profession,cnt, row_number() over (order by cnt desc ) as rk from t1)select *from t2where rk<=5;select *from (select t.profession, t.cnt, rank() over (order by t.cnt desc ) as rk from (select profession, count(*) as cnt from dw_didi.t_user_order_wide group by profession) t) ttwhere tt.rk <= 5;--建表create table if not exists app_didi.t_order_profession_total_topn( profession string comment '职业', Order_cnt int comment '订单数量', rk int comment '排名') partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';--加载数据insert overwrite table app_didi.t_order_profession_total_topn partition (month = '2020-04')select *from (select t.profession, t.cnt, rank() over (order by t.cnt desc ) as rk from (select profession, count(*) as cnt from dw_didi.t_user_order_wide group by profession) t) ttwhere tt.rk <= 5;--4.6用户订单取消占比select '2020-04-12' date_val, concat(round(t1.total_cnt / t2.total_cnt * 100, 2), '%') as cancel_order_percentfrom (select count(orderid) as total_cnt from ods_didi.t_user_cancel_order where dt = '2020-04-12') t1 , (select count(orderid) as total_cnt from dw_didi.t_user_order_wide where dt = '2020-04-12') t2;--创建表create table if not exists app_didi.t_order_cancel_order_percent( datetime string comment '日期', cancel_order_percent string comment '百分比') partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';--加载数据insert overwrite table app_didi.t_order_cancel_order_percent partition (month = '2020-04')select '2020-04-12' date_val , concat(round(t1.total_cnt / t2.total_cnt * 100, 2), '%') as percent_valfrom (select count(*) total_cnt from ods_didi.t_user_cancel_order where dt = '2020-04-12') t1 , (select count(*) total_cnt from dw_didi.t_user_order_wide where dt = '2020-04-12') t2;-- 4.8统计用户取消订单原因top1with t1 as(select reason,profession, count(reason) over() cnt from dw_didi.t_user_cancel_order where dt = '2020-04-12' ), t2 as(select reason,profession, cnt, row_number() over(order by cnt desc) as rk from t1 )select *from t2where rk<=5;insert overwrite table app_didi.t_order_cancel_reason partition (month = '2020-04')select *from( select t.profession, t.cnt, rank() over (order by t.cnt desc ) as rk from (select profession, count(*) as cnt from dw_didi.t_user_cancel_order group by profession) t ) ttwhere tt.rk <= 5;--建表create table if not exists app_didi.t_order_cancel_reason( profession string comment '职业', cancel_cnt int comment '订单数量', rk int comment '排名')partitioned by (month string comment '年月,yyyy-MM')row format delimited fields terminated by ',';--统计每个省订单量最高的城市top3select city, count(city)from dw_didi.t_user_order_widewhere length(city) > 0group by city;--建表insert overwrite table app_didi.t_order_city partition (month = '2020-04')select *from( select t.city, t.cnt, rank() over (order by t.cnt desc ) as rk from (select city, count(city) as cnt from dw_didi.t_user_order_wide where length(city) > 0 group by city) t ) ttwhere tt.rk <= 3;create table if not exists app_didi.t_order_city( city string comment '城市', order_cnt int comment '订单数量', rk int comment '排名') partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';--统计订单支付中使用优惠券的百分比create table if not exists app_didi.t_order_dicount( isdicount string comment '是否使用优惠券', order_cnt string comment '百分比')partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';insert overwrite table app_didi.t_order_dicount partition (month = '2020-04')select `是否使用优惠券`, concat(round((cnt / sum(cnt) over ()) * 100, 2), '%') `百分比`from (select has_coupon as `是否使用优惠券`, count( has_coupon) as cnt from dw_didi.t_user_pay_order where has_coupon!=17 group by has_coupon) t;--统计用户五星级好评的百分比create table if not exists app_didi.t_order_five_start( fivestart string comment '是否是5', order_cnt string comment '百分比') partitioned by (month string comment '年月,yyyy-MM') row format delimited fields terminated by ',';insert overwrite table app_didi.t_order_five_start partition (month = '2020-04')select `是否是5`, concat(round((cnt / sum(cnt) over ()) * 100, 2), '%') `百分比`from (select eva_level as `是否是5`, count(eva_level ) as cnt from dw_didi.t_user_evaluate where length(eva_level)>0 group by eva_level ) t; |
1 | create table if not exists app_didi.t_order_subscribe_percent |
1 | -- 1.数据仓库构建 |
1 | create database app_edu_chat; |
1 | create database ods_edu; |
三、基本sql
一、数据库创建与删除
1.强制删除数据库
1 | drop database itcast cascade; |
2.创建数据库
1 | create database if not exists myhive; |
3. 切换使用数据库
1 | use myhive; |
4. 查看数据库详细信息
1 | desc database myhive; |

5.删除数据库
1 | drop database myhive; |
6.查询当前数据库
1 | select current_database(); |
二、数据库表
1.删除表
1 | drop table biao ; |
2.创建表
1 | create table t_archer( id int comment "ID", |
3.Hive建表时候的字段类型
| 分类 | 类型 | 描述 | 字面量示例 |
|---|---|---|---|
| 原始类型 | BOOLEAN | true/false | TRUE |
| TINYINT | 1字节的有符号整数 -128~127 | 1Y | |
| SMALLINT | 2个字节的有符号整数,-32768~32767 | 1S | |
| INT | 4个字节的带符号整数 | 1 | |
| BIGINT | 8字节带符号整数 | 1L | |
| FLOAT | 4字节单精度浮点数1.0 | ||
| DOUBLE | 8字节双精度浮点数 | 1.0 | |
| DEICIMAL | 任意精度的带符号小数 | 1.0 | |
| STRING | 字符串,变长 | “a”,’b’ | |
| VARCHAR | 变长字符串 | “a”,’b’ | |
| CHAR | 固定长度字符串 | “a”,’b’ | |
| BINARY | 字节数组 | 无法表示 | |
| TIMESTAMP | 时间戳,毫秒值精度 | 122327493795 | |
| DATE | 日期 | ‘2016-03-29’ | |
| INTERVAL | 时间频率间隔 | ||
| 复杂类型 | ARRAY | 有序的的同类型的集合 | array(1,2) |
| MAP | key-value,key必须为原始类型,value可以任意类型 | map(‘a’,1,’b’,2) | |
| STRUCT | 字段集合,类型可以不同 | struct(‘1’,1,1.0), named_stract(‘col1’,’1’,’col2’,1,’clo3’,1.0) | |
| UNION | 在有限取值范围内的一个值 | create_union(1,’a’,63) |
4.创建内部表
1 | create external table student_exter( |
5.根据查询结果创建表
1 | select * |
6.去重结果
1 | select distinct ssex |
7.根据已经存在的表结构创建表
1 | create table stu4 like stu2; |
8.查询表的类型
1 | desc formatted stu2; |
显示表的简要信息
1 | desc stu2; |
9.数据装载载命令Load
1 | load data [local] inpath '/export/data/hive_data/student.txt' [overwrite] into table student [partition (partcol1=val1,…)]; |
1、load data:表示加载数据
2、local: 表示从本地加载数据到hive表;否则从HDFS加载数据到hive表
3、inpath:表示加载数据的路径
4、overwrite:表示覆盖表中已有数据,否则表示追加
5、into table:表示加载到哪张表
6、student:表示具体的表
7、partition:表示上传到指定分区
1 | load data local inpath '/export/data/didi/order.csv' into table ods_didi.t_user_order partition (dt = '2020-04-12'); |
1 | select ssex |
sqoop import
–connect jdbc:mysql://node1:3306/
–username root
–password 123456
–table web_chat_ems_2019_07
–warehouse-dir /user/hive/warehouse –hive-database ods_edu
–hive-import
–hive-table web_chat_ems_2019_07
四、一些sql操作
静态分区

1 | # 静态分区 |

1 | --动态分区 |


1 | -- 追加 |
1 | --复写 |
例子





删除数据
1 | truncate table score3; |



复制表结构

分桶


1 | -- 1.数据仓库构建 |
1 |
with写法




sqoop list-databases –connect jdbc:mysql://192.168.52.161:3306/ –username root –password 123456

1 | grant all privileges on *.* to 'root'@'node1' identified by '123456' with grant option; |
1 | create database if not exists app_didi; |
1 | create table if not exists app_didi.t_order_subscribe_total( |
1 | /export/server/sqoop-1.4.6/bin/sqoop export \ |
1 | /export/server/sqoop-1.4.6/bin/sqoop export \ |
1 | #导出不同时段订单统计表 |
二、常用命令(copy)
- 上传到hdfs
1 | hadoop fs -put archer.txt /user/hive/warehouse/test.db/t_archer |
- 关机
1 | shutdown -h now |
- 重启
1 | reboot |
- 查看进程
1 | ps |
- mysql 活着管道
1 | ps -ef| grep mysql |
- jps全称
1 | java process connect |
- 查看进程
1 | jps java process server |
- 启动进程脚本(根目录cd ~)
1 | /onekey/my-start-all.sh |
- 关闭进程脚本
1 | /onekey/my-stop-all.sh关闭进程脚本 |
- 查看 网关linux
1 | ifconfig |
- 网关 windows
1 | ipconfig |
- 进入mysql
1 | mysql -uroot -p |
- 查看端口
1 | isof -f 10000 |
- 杀死端口
1 | kill -9 xxxx |
- 查看详细端口运行
1 | jps -m |
- 查看指定端口
1 | lsof -i:10000 |
- 展开列表
1 | alt +enter ctrl+enter |
- 格式化
1 | crtl+alt+L |
未命名
滴滴出行
1 | -- 1.数据仓库构建 |
一、集群
root
123456
shutdown -h now 关机
第一章 集群
1.1. 安装集群
将集群目录 复制 d:/opt/shixun2 目录中

双击 node1.vmx 文件即可启动




1.2. 配置网卡(每天开机后都需要配置)
1.2.1. 配置vmware的网卡

1.2.2. 配置windows系统的网卡信息





1.2.3. 测试是否可以连上互联网?

1.3. 通过crt客户端操作linux
1.3.1. 通过在windows中是否可以连上linux系统


1.3.2. 通过客户端操作linux系统




1.4. 一键启动集群
1 | /onekey/my-start-all.sh |

1.5. 一键启动hive
1 | cd /export/server/hive-2.1.0/bin |

1 | show databases; |

1.6. 查看进程
1 | jps |

1.7. 集群的页面访问
1.7.1. IP访问
一旦Hadoop集群启动并运行,可以通过web-ui进行集群查看,如下所述:
查看NameNode页面地址:

查看Yarn集群页面地址:
http://192.168.52.161:8088/cluster

查看MapReduce历史任务页面地址:
http://192.168.52.161:19888/jobhistory

1.7.2. 主机名访问
请注意,以上的访问地址只能使用IP地址,如果想要使用主机名,则对Windows进行配置。
配置方式:
1、打开Windows的C:\Windows\System32\drivers\etc目录下hosts文件
2、在hosts文件中添加以下域名映射
1 | 192.168.52.161 node1 node1.itcast.cn |
配置完之后,可以将以上地址中的IP替换为主机名即可访问,如果还不能访问,则需要重启Windows电脑,比如访问NameNode,可以使用http://node1:50070/ 。
1.8. Hadoop初体验: HDFS使用

1、从Linux本地上传一个文本文件到hdfs的/目录下
1 | #在/export/data/目录中创建a.txt文件,并写入数据 |
1、通过页面查看
通过NameNode页面.进入HDFS:http://node1:50070/

查看文件是否创建成功.

1.9. 一键关闭集群
1.9.1. 关闭 hdfs 和 yarn 和 historyserver
1 | /onekey/my-stop-all.sh |

1.9.2. 关闭跟hive相关的服务
查询进程号
1 | jps |

杀掉 RunJar 对应的进程号
1 | kill -9 xxxx |

1.9.3. 关闭系统
1 | shutdown -h now |

1.10. 给集群拍摄快照



注意: 一定要在关机的状态下拍摄快照!
linux安装一些软件

开终端或 Anaconda Prompt。
创建一个新的环境(可选):
1
2codeconda create --name mypysparkenv
conda activate mypysparkenv安装 PySpark:
1
codeconda install -c conda-forge pyspark
输入以下命令,以退出 Anaconda 环境:
1
Copy codeconda deactivate
激活
1 | conda activate mypysparkenv |
查看版本
1 | import pyspark |
1 | conda install -n mypysparkenv ipykernel --update-deps --force-reinstall |
1 | conda install -n mypysparkenv ipykernel --update-deps --force-reinstall |
1 | export SPARK_HOME=/path/to/spark/installation/in/anaconda/envs/myenv/lib/pythonX.X/site-packages/pyspark |
1 | export PYSPARK_HOME=/usr/local/spark |
有关deb文件
1 | sudo dpkg -i mysql-connector-j_8.0.33-1ubuntu22.04_all.deb |