Run SELECT Statement Against File in Object Storage (S3, OSS and COS)
In this post, we will demonstrate how to run SELECT statement directly to a file stored in the object storage. The typical file formats are CSV and JSON file types. We will illustrate how to handle the object content selection based on each public cloud vendors, i.e. AWS, Alibaba Cloud, and Tencent Cloud. And hence, the object storage services that we will talk, are S3 for AWS, OSS for Alibaba Cloud, and COS for Tencent Cloud.
Before we start, we first need to simulate sample data, which SELECT statement will run against.
For CSV, we prepare a file with below data in it, and save this comma delimited data into a CSV format file.
SSO,Display name,Email,Mobile phone,Company,Role,Purpose,Aliyun account,Group name,Need console access, 41734287,Leo,Leo@example.com,,,,,primary,,No, 32504734,Mil,Mil@example.com,,,,,primary,,No, 56923933,Sap,Sap@example.com,,,,,primary,,No, 33261444,Tem,Tem@example.com,,,,,primary,,No, 30012677,Fas,Fas@example.com,,,,,primary,,No,
For JSON, we prepare another file with below data in it.
{ "name": "Leo", "org": { "name": "SkyCone" }, "projects": [ {"project_name":"project1", "completed":true}, {"project_name":"project2", "completed":false} ] }
Now we have finished the preparation.
Without further ado, let's start.
Tencent Cloud COS
In Tencent Cloud console, upload these files to a COS bucket.
CSV File Type
For the object that need to run SELECT statement, Move Actions --> Extract.
In the next page, configure the COS Select as below.
Or input SQL query directly.
select * from cosobject s limit 100
Click "Run SQL".
JSON File Type
For JSON Document file, select "JSON" in the File Type field.
You can run below SQL and get the same result.
select s1.name from COSObject s1
Here, the "s" or "s1" is called alias. The JSON document could be accessed by using a format like "alias.name", e.g. "s.name" or "s1.name".
For nested JSON structure, use below SELECT query to get the data.
select s1.org.name from COSObject s1
Alibaba Cloud OSS
In Alibaba Cloud OSS console, we upload the same files.
For OSS, by the way, Parquet file format is not supported.
For the object that need to run SELECT statement, Move --> Select Content.
In the next page, configure the OSS Select Content as below.
In the SQL Editor, input below SELECT statement, and we will get below result (shown in the screenshot).
select * from ossobject limit 10
JSON File Type
When attempting to run SELECT statement against a file of JSON format, select "JSON_DOCUMENT" in the JSON Display Mode field.
Paste below SELECT statement into the SQL Editor, and we will get below result (shown in the screenshot).
select s.org.name from ossobject s
To get a nested list inside the JSON, use below SELECT statement. This will extract the second (index 0 for the first, and then index 1 is for the second) element in the list of "projects".
select s.projects[1].project_name from ossobject s
The JSON file could be extended to below format.
[ {"project_name":"project1", "completed":true}, {"project_name":"project2", "completed":false} ]
For this kind of JSON data, query it using below SELECT statement.
select s[1].project_name from ossobject s
AWS S3 Select
Upload the same CSV and JSON files to an AWS S3 bucket.
CSV File Type
For the CSV object that need to run SELECT statement, click the object and go the the "Select from" tab.
In the SQL editor, input below SELECT statement, and we will get below result (shown in the screenshot).
select * from s3object s limit 5
JSON File Type
When attempting to run SELECT statement against JSON file format, select "JSON" in the File format field.
select s.projects[1].project_name from s3object s
In the SQL editor, run below SELECT statement, and we will get below result (shown in the screenshot).
References
Select 命令
SelectObject (OSS)