While this takes more effort upfront in the query writing process, habit such as this will help ensure you are utilizing the cluster in a more efficient manner. If you only need a month of data, don’t pass the entire table around until the very end. When possible, filter data as early as possible in your query. Keep in mind sort keys and the columnar architecture in how it can make you more effective in reading data from tables. In terms of query optimization, think in terms of how much data you are reading, not the total size of the table. Often in EDWs we are actually just pulling out the latest week or month’s data. Just because your table is giant, doesn’t necessarily mean that the results you are extracting out (or the data being scanned) is giant. If you don’t have a clear vision of all access patterns, start with DIST STYLE EVEN, and build a baseline of access patterns on your cluster to optimize against. Before making optimization decisions, think about the data volume (result set size), query frequency, and downstream impact of optimizing towards those operations. The grander point is keep access patterns in mind when architecting tables. Knowing what your Distribution keys and Sort Keys are when joining tables can help you write better queries. How the table is architected will impact how Redshift is able to leverage query plans and so keeping that in mind can help you effectively define tables and query them. Think about table Distribution and Sort keys and how they affect queries.There are a couple of themes when looking at query performance on Redshift: The result set from the query defines the columns and rows of the materialized view. AS query A valid SELECT statement that defines the materialized view and its content. For more information, see Working with sort keys. Re-run the query again, and now Redshift scans much less data. The sort key for the materialized view, in the format SORTKEY ( columnname. In our sales dashboard, we like to focus on recent 12 months orders, let’s add order date filter and run the query and check how it’s executed.Īlter table orders alter COMPOUND sortkey (o_orderdate) > XN Hash Join DS_DIST_ALL_NONE (cost = 14923. ![]() > XN Hash Join DS_DIST_NONE (cost = 84157. ![]() Notice the join strategy, DS_BCAST_INNER, DS_DIST_OUTER, looks like lots of data shuffling happened. ![]() An INTERLEAVED sort key can use a maximum of eight columns. The default COMPOUND is recommended unless your tables aren't updated regularly with INSERT, UPDATE, or DELETE. 00 rows = 969354 width = 10)įilter: ((ca_country)::text = 'United States'::text) To define a sort type, use either the INTERLEAVED or COMPOUND keyword with your CREATE TABLE or CREATE TABLE AS statement. > XN Seq Scan on customer_address d (cost = 0. Hash Cond: ( "outer".c_current_addr_sk = "inner".ca_address_sk) > XN Hash Join DS_DIST_OUTER (cost = 14923. Hash Cond: ( "outer".o_custkey = ( "inner".c_customer_sk)::bigint)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |