0. Data Lake Analytics(简称 DLA)介绍
数据湖 (Data Lake) 是时下热门的概念, 更多阅读可以参考:
https://en.wikipedia.org/wiki/Data_lake
基于数据湖, 可以不用做任何 ETL, 数据搬迁等过程, 实现跨各种异构数据源进行大数据关联分析, 从而极大的节省成本和提升用户体验.
以及 AWS 和 Azure 关于 Data Lake 的解读:
终于, 阿里云现在也有了自己的数据湖分析产品: https://www.aliyun.com/product/datalakeanalytics
可以点击申请使用(目前公测阶段还属于邀测模式), 体验本教程分析 OTS 数据之旅.
产品文档: https://help.aliyun.com/product/70174.html
1. Table Store(简称 OTS)介绍
关于 Table Store 的详细介绍, 请看: https://help.aliyun.com/document_detail/27280.html
2. DLA 和 OTS 等存储引擎间的关系
DLA 是上层 MPP 架构的 SQL 执行引擎, 以 MySQL 语法作为语法 API 而实现, 解决实时 OLAP 分析需求;
OTS 是下层数据存储引擎, 基于 LSM 实现, 与 HBase,BigTable 有类似的设计和实现;
DLA 支持连接多种存储引擎, 除了 OTS, 还有 OSS,ADS,RDS 等, 并且这些引擎可以做实时混合分析;
DLA 在计算一个 Query 时, 通过 OTS 核心接口, 查询下层数据并参与上层分析;
DLA 是大小写不敏感的, 而 OTS 是大小写敏感的;
3. OTS 和 DLA 元信息映射逻辑
库和表等概念映射
OTS 概念 | DLA 概念 |
---|---|
实例(instance) | schema 或 database,不同的用户不同的叫法 |
表(table) | table |
主键列(pk) | column,isPrimaryKey=true,isNullable=false |
非主键列(column) | column,isPrimaryKey=false,isNullable=< 看用户的 DDL 定义 & gt; |
字段的映射关系
OTS | DLA |
---|---|
INTEGER(8bytes) | bigint(8bytes) |
STRING | varchar |
BINARY | varbinary |
DOUBLE | double |
BOOLEAN | byte |
4. 购买 OTS 的实例, 并写入数据
下面, 我们开始真正的操作:
开通 OTS 服务, 并购买 OTS 实例: https://help.aliyun.com/document_detail/27293.html
进入 OTS 的管理控制台, 选择合适的 region, 创建实例和表: https://ots.console.aliyun.com/index
当然, 也可以选择通过 SDK 来创建表并写入数据: https://help.aliyun.com/document_detail/43005.html
5. 查看 OTS 的实例, 获取关键信息
下面, 我就以我们的测试数据, 来开启整个过程(跳过具体的申请步骤):
1. 查看目前 DLA 已开通的 Region, 并确保与你的 OTS 在同一个 Region:
2. 进入 OTS 管理控制台, 选择杭州 Region, 查看我的实例(标准 TPC-H 生成的测试集, 有 8 张表; 已提前建好库表, 并通过 SDK 写入了数据):
3. 查看实例信息, 看到相关的__endpoint(DLA 目前支持公网, 所以请选择私网)__, 这里以 hz-tpch-1x-vol 作测试:
4. 查看 nation 表定义 (表名, 主键名, 主键类型, 多主键顺序等) 和数据, 用作后续对比测试:
6. 用户开通 DLA 账号步骤:
用户具备了阿里云账号(主账号);
用户进入产品介绍页, 开通 DLA 并进入控制台: https://www.aliyun.com/product/datalakeanalytics
等用户开通之后, 会在你的短信, 站内信, 邮箱收到账号相关的信息(内容模板可能会升级):
用户通过在页面上查看一下, 得到如下的访问入口信息:
如下是基于 MySQL/jdbc 方式通过公网经典 endpoint 连接到 dla 杭州集群:
MySQL 命令行:
MySQL -h < 您的 DLA 经典 endpoint, 在 DLA 的 console 上> -P10000 -u<dla_username> -p<dla_password> -c -A
- JDBC URL:
- jdbc:MySQL://<您的 DLA 经典 endpoint, 在 DLA 的 console 上>:10000/
- username=<dla_username>
- password=<dla_password>
7. DLA 和 OTS 网络连通性问题
目前 DLA 和 OTS 服务之间, 通过 VPC 相关的策略, 是直接为用户打通网络环境的, 用户无需担心这个过程. 但 DLA 目前不支持公网访问, 请__务必使用 OTS 的 VPC Endpoint!__
8. 使用 DLA, 连接你的 OTS, 进行查询和分析
注: 我们是多租户场景的, 所以新用户刚进去时看不到任何库表;
1)创建自己的 DLA 库(相关信息从上述过程中查找):
- MySQL> create database hangzhou_ots_test with dbproperties (
- catalog = 'ots',
- location = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',
- instance = 'hz-tpch-1x-vol'
- );
- Query OK, 0 rows affected (0.23 sec)
- #hangzhou_ots_test --- 请注意库名, 允许字母, 数字, 下划线
- #catalog = 'ots', --- 指定为 ots, 是为了区分其他数据源, 比如 oss,rds 等
- #location = 'https://xxx' ---ots 的 endpoint, 从实例上可以看到
- #instance = 'hz-tpch-1x-vol' --- 指定 instance 名, 因为 endpoint 可以不带实例名; 最终映射到 DLA 的 schema
2)查看自己创建的库:
- MySQL> show databases;
- +------------------------------+
- | Database |
- +------------------------------+
- | hangzhou_ots_test |
- +------------------------------+
- 1 rows in set (0.22 sec)
- MySQL> show create database hangzhou_ots_test;
- +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- | Database | Create Database |
- +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- | hangzhou_ots_test | CREATE DATABASE `hangzhou_ots_test`
- WITH DBPROPERTIES (
- CATALOG = 'ots',
- LOCATION = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',
- INSTANCE = 'hz-tpch-1x-vol'
- ) |
- +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- 1 row in set (0.31 sec)
3)查看自己的 DLA 表:
- MySQL> use hangzhou_ots_test;
- Database changed
- MySQL> show tables;
- Empty set (0.30 sec)
4)创建 DLA 表, 映射到 OTS 的表:
- MySQL> CREATE EXTERNAL TABLE `nation` (
- `N_NATIONKEY` int not NULL ,
- `N_COMMENT` varchar(100) NULL ,
- `N_NAME` varchar(100) NULL ,
- `N_REGIONKEY` int NULL ,
- PRIMARY KEY (`N_NATIONKEY`)
- );
- Query OK, 0 rows affected (0.36 sec)
- ## `N_NATIONKEY` int not NULL ---- 如果是主键的话, 必须要 not null
- ## PRIMARY KEY (`N_NATIONKEY`) ---- 务必与 ots 中的主键顺序相同; 名称的话也要对应
5)查看自己创建的表和相关的 DDL 语句:
- MySQL> show tables;
- +------------+
- | Table_Name |
- +------------+
- | nation |
- +------------+
- 1 row in set (0.35 sec)
- MySQL> show create table nation;
- +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- | Table | Create Table |
- +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- | nation | CREATE EXTERNAL TABLE `nation` (
- `n_nationkey` int not NULL COMMENT '',
- `n_comment` varchar(100) NULL COMMENT '',
- `n_name` varchar(100) NULL COMMENT '',
- `n_regionkey` int NULL COMMENT '',
- PRIMARY KEY (`n_nationkey`)
- )
- TBLPROPERTIES (COLUMN_MAPPING = 'n_nationkey,N_NATIONKEY; n_comment,N_COMMENT; n_name,N_NAME; n_regionkey,N_REGIONKEY;')
- COMMENT '' |
- +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- 1 row in set (0.30 sec)
6)开始查询和分析(没有做太复杂的 query; 用户可以分析自己的数据, 符合 MySQL 的语法)
- MySQL> select count(*) from nation;
- +-------+
- | count(*) |
- +-------+
- | 25 |
- +-------+
- 1 row in set (1.19 sec)
- MySQL> select * from nation;
- +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
- | n_nationkey | n_comment | n_name | n_regionkey |
- +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
- | 0 | haggle. carefully final deposits detect slyly agai | ALGERIA | 0 |
- | 1 | al foxes promise slyly according to the regular accounts. bold requests alon | ARGENTINA | 1 |
- | 2 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special | BRAZIL | 1 |
- | 3 | eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold | CANADA | 1 |
- | 4 | y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d | EGYPT | 4 |
- | 5 | ven packages wake quickly. regu | ETHIOPIA | 0 |
- | 6 | refully final requests. regular, ironi | FRANCE | 3 |
- | 7 | l platelets. regular accounts x-ray: unusual, regular acco | GERMANY | 3 |
- | 8 | ss excuses cajole slyly across the packages. deposits print aroun | INDIA | 2 |
- | 9 | slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull | INDONESIA | 2 |
- | 10 | efully alongside of the slyly final dependencies. | IRAN | 4 |
- | 11 | nic deposits boost atop the quickly final requests? quickly regula | IRAQ | 4 |
- | 12 | ously. final, express gifts cajole a | JAPAN | 2 |
- | 13 | ic deposits are blithely about the carefully regular pa | JORDAN | 4 |
- | 14 | pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t | KENYA | 0 |
- | 15 | rns. blithely bold courts among the closely regular packages use furiously bold platelets? | MOROCCO | 0 |
- | 16 | s. ironic, unusual asymptotes wake blithely r | MOZAMBIQUE | 0 |
- | 17 | platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun | PERU | 1 |
- | 18 | c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos | CHINA | 2 |
- | 19 | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account | ROMANIA | 3 |
- | 20 | ts. silent requests haggle. closely express packages sleep across the blithely | SAUDI ARABIA | 4 |
- | 21 | hely enticingly express accounts. even, final | VIETNAM | 2 |
- | 22 | requests against the platelets use never according to the quickly regular pint | RUSSIA | 3 |
- | 23 | eans boost carefully special requests. accounts are. carefull | UNITED KINGDOM | 3 |
- | 24 | y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be | UNITED STATES | 1 |
- +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
- 25 rows in set (1.63 sec)
从图中的 id, 可以看到, 与 ots 中的数据相同:
9. 其他相关的文档参考:
Data Lake Analytics 使用场景: https://help.aliyun.com/document_detail/70380.html
OLAP on TableStore-- 基于 Data Lake Analytics 的 Serverless SQL 大数据分析 https://yq.aliyun.com/articles/618501
使用 Data Lake Analytics 从 OSS 清洗数据到 AnalyticDB:https://yq.aliyun.com/articles/623401
使用 Data Lake Analytics 分析 OSS 数据: https://help.aliyun.com/document_detail/70387.html
Data Lake Analytics 数据库的连接方式: https://help.aliyun.com/document_detail/71074.html
来源: https://yq.aliyun.com/articles/634700