Amazon Redshift Database Developer Guide - Sharing data across clusters in Amazon Redshift

2023年03月30日

Sharing data across clusters in Amazon Redshift

3-part notation (consumer_database_name.schema_name.table_name)

How data sharing works in Amazon Redshift

Working with views in Amazon Redshift data sharing

View name Can this view be added to a datashare? Can a consumer create this view on datashare objects across clusters?

Regular view Yes No

Late-binding view Yes Yes

Materialized view Yes Yes, but only with a complete refresh

View name	Can this view be added to a datashare?	Can a consumer create this view on datashare objects across clusters?
Regular view	Yes	No
Late-binding view	Yes	Yes
Materialized view	Yes	Yes, but only with a complete refresh

create table event(
  eventid integer not null distkey,
  venueid smallint not null,
  catid smallint not null,
  dateid smallint not null sortkey,
  eventname varchar(200),
  starttime timestamp);

create table category(
  catid smallint not null distkey sortkey,
  catgroup varchar(10),
  catname varchar(10),
  catdesc varchar(50));
  
create table sales(
  salesid integer not null,
  listid integer not null distkey,
  sellerid integer not null,
  buyerid integer not null,
  eventid integer not null,
  dateid smallint not null sortkey,
  qtysold smallint not null,
  pricepaid decimal(8,2),
  commission decimal(8,2),
  saletime timestamp);

copy event 
from 's3://skycone-lab-us-west-2/tickit/allevents_pipe.txt'
credentials 'aws_iam_role=arn:aws:iam::<111122223333>:role/myRedshiftRole'
delimiter '|' timeformat 'YYYY-MM-DD HH:MI:SS' region 'us-west-2';

copy category 
from 's3://skycone-lab-us-west-2/tickit/category_pipe.txt'
credentials 'aws_iam_role=arn:aws:iam::<111122223333>:role/myRedshiftRole'
delimiter '|' region 'us-west-2';

copy sales 
from 's3://skycone-lab-us-west-2/tickit/sales_tab.txt'
credentials 'aws_iam_role=arn:aws:iam::<111122223333>:role/myRedshiftRole'
delimiter '\t' timeformat 'MM/DD/YYYY HH:MI:SS' region 'us-west-2';

Create view.

create view myevent_regular_vw as select eventid, eventname from event;
create or replace view myevent_regular_vw as select eventid, eventname from event
where eventname = 'LeAnn Rimes';

Shows the output of a regular view that is supported with data sharing. (regular view)

SELECT * FROM dev.public.myevent_regular_vw 
ORDER BY eventid LIMIT 5;

Result:

eventid,eventname
3835,LeAnn Rimes
3967,LeAnn Rimes
4856,LeAnn Rimes
4948,LeAnn Rimes
5131,LeAnn Rimes

Show the output of a late-binding view that is supported with data sharing. (late-binding view)
Create late-binding view.

create view event_lbv as select * from public.event
with no schema binding;

SELECT * FROM dev.public.event_lbv 
ORDER BY eventid LIMIT 5;

Result:

eventid,venueid,catid,dateid,eventname,starttime
1,305,8,1851,Gotterdammerung,2008-01-25 14:30:00
2,306,8,2114,Boris Godunov,2008-10-15 20:00:00
3,302,8,1935,Salome,2008-04-19 14:30:00
4,309,8,2090,La Cenerentola (Cinderella),2008-09-21 14:30:00
5,302,8,1982,Il Trovatore,2008-06-05 19:00:00

Show the output of a materialized view that is supported with data sharing. (materialized view)
Create materialized view.

CREATE MATERIALIZED VIEW tickets_mv AS
    select   catgroup,
    sum(qtysold) as sold
    from     category c, event e, sales s
    where    c.catid = e.catid
    and      e.eventid = s.eventid
    group by catgroup;

SELECT * FROM dev.public.tickets_mv;

Result:

  catgroup | qtysold
 ----------+---------
  Concerts | 195444
    Shows  | 149905

You can maintain common tables across all tenants in a producer cluster. You can also share subsets of data filtered by dimension columns, such as tenant_id (account_id or namespace_id), to consumer clusters. To do this, you can define a view on the base table with a filter on these ID columns, for example current_aws_account = tenant_id. On the consumer side, when you query the view, you see only the rows that qualify for your account. To do this, you can use the Amazon Redshift context functions current_aws_account and current_namespace.

The following query returns the account ID in which the current Amazon Redshift cluster resides. You can run this query if you are connected to Amazon Redshift.

select current_user, current_aws_account;

Result:

current_user,current_aws_account
admin,<111122223333>

The following query returns the namespace of the current Amazon Redshift cluster. You can run this query if you are connected to the database.

select current_user, current_namespace;

Result:

current_user,current_namespace
admin,cebae53b-deb3-4948-9973-03e43f6f9e36

Managing permissions for datashares in Amazon Redshift

Creates a datashare salesshare and adds a table public.sales to salesshare. It also grants usage permissions on salesshare to a cluster namespace.

ERROR: Schema public not part of the datashare salesshare. Please add schema and then add objects under it [ErrorId: 1-64250475-3ae5cea028b4a35e09bc0e7c]

如果出现上面的报错，执行SQL：

ALTER DATASHARE salesshare ADD SCHEMA public;

CREATE DATASHARE salesshare;

ALTER DATASHARE salesshare ADD TABLE public.sales;

GRANT USAGE ON DATASHARE salesshare TO NAMESPACE 'ab23d5cc-82bb-49b3-af42-b02c15e34291';

再创建一个Redshift作为consumer.

Creating databases from datashares
To start querying data in the datashare, create a database from a datashare. You can create only one database from a specified datashare.

1. Open the Amazon Redshift console.
2. On the navigation menu, choose Clusters, then choose your cluster. The cluster details page appears.
3. Choose Datashares. The datashare list appears.
4. In the Datashares from other namespaces and AWS accounts section, choose a datashare that you want to create databases from, then click Create database from datashare. The Create database from datashare page appears.

5. In the Database name, specify a database name. 这里，Database name使用“sales_db”。The database name must be 1–64 alphanumeric characters (lowercase only) and it can't be a reserved word.
6. Choose Create.

After the database is created, you can query data in the database.

查看它的namespace的ID。

Grant permissions to access a shared table at the database level and schema level.

CREATE USER Bob PASSWORD 'sha256|******!';
CREATE GROUP Analyst_group;

On the consumer side, a consumer cluster administrator can determine which users and groups should get access to the shared data. An administrator can control access at the database or schema level. To control access at the schema level, the administrator must create an external schema from the Amazon Redshift database created from the datashare.

The following example grants permissions to access a shared table at the database level and schema level.

GRANT USAGE ON DATABASE sales_db TO Bob;

CREATE EXTERNAL SCHEMA sales_schema FROM REDSHIFT DATABASE sales_db;

GRANT USAGE ON SCHEMA sales_schema TO GROUP Analyst_group;

References

-

Category: data Tags: Redshift public

Sky Cone

Amazon Redshift Database Developer Guide - Sharing data across clusters in Amazon Redshift