OPTIMIZE TABLE reorganizes the physical storage of table data and associated index data, to reduce storage space and improve I/O efficiency when accessing the table. The exact changes made to each table depend on the storage engine used by that table.
Table Optimizer download
After doing substantial insert, update, or delete operations on an InnoDB table that has its own .ibd file because it was created with the innodb_file_per_table option enabled. The table and indexes are reorganized, and disk space can be reclaimed for use by the operating system.
After doing substantial insert, update, or delete operations on columns that are part of a FULLTEXT index in an InnoDB table. Set the configuration option innodb_optimize_fulltext_only=1 first. To keep the index maintenance period to a reasonable time, set the innodb_ft_num_word_optimize option to specify how many words to update in the search index, and run a sequence of OPTIMIZE TABLE statements until the search index is fully updated.
After deleting a large part of a MyISAM or ARCHIVE table, or making many changes to a MyISAM or ARCHIVE table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. After extensive changes to a table, this statement may also improve performance of statements that use the table, sometimes significantly.
By default, OPTIMIZE TABLE does not work for tables created using any other storage engine and returns a result indicating this lack of support. You can make OPTIMIZE TABLE work for other storage engines by starting mysqld with the --skip-new option. In this case, OPTIMIZE TABLE is just mapped to ALTER TABLE.
OPTIMIZE TABLE table catches and throws any errors that occur while copying table statistics from the old file to the newly created file. For example. if the user ID of the owner of the .MYD or .MYI file is different from the user ID of the mysqld process, OPTIMIZE TABLE generates a "cannot change ownership of the file" error unless mysqld is started by the root user.
For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE ... FORCE, which rebuilds the table to update index statistics and free unused space in the clustered index. This is displayed in the output of OPTIMIZE TABLE when you run it on an InnoDB table, as shown here:
OPTIMIZE TABLE uses online DDL for regular and partitioned InnoDB tables, which reduces downtime for concurrent DML operations. The table rebuild triggered by OPTIMIZE TABLE is completed in place. An exclusive table lock is only taken briefly during the prepare phase and the commit phase of the operation. During the prepare phase, metadata is updated and an intermediate table is created. During the commit phase, table metadata changes are committed.
Make space: When you edit a post or page on your website, WordPress automatically saves the new revision to the database. If you edit things a few times (and particularly if the post is long), your database soon gets clogged up with old revisions that just sit there, taking up valuable space. WP-Optimize Cache removes these unnecessary post revisions, freeing up valuable Megabytes of data and increasing speed and efficiency. It also cleans up your comments table, removing all the spam and un-approved comments that have built up with a single click.
Take control: WP-Optimize Cache reports on exactly which of your database tables have overhead and wasted space, giving you the insight, control and power to keep your website neat, fast and efficient.
Amazon Redshift has several features that automate performance tuning: automatic vacuum delete, automatic table sort, automatic analyze, and Amazon Redshift Advisor for actionable insights into optimizing cost and performance. In addition, automatic workload management (WLM) makes sure that you use cluster resources efficiently, even with dynamic and unpredictable workloads. Amazon Redshift can even automatically refresh and rewrite materialized views, speeding up query performance by orders of magnitude with pre-computed results. These capabilities use machine learning (ML) to adapt as your workloads shift, enabling you to get insights faster without spending valuable time managing your data warehouse.
Although Amazon Redshift provides industry-leading performance out of the box for most workloads, some queries benefit even more by pre-sorting and rearranging how data is physically set up on disk. In Amazon Redshift, you can set the proper sort and distribution keys for tables and allow for significant performance improvements for the most demanding workloads.
As you can notice, as users query the data in Amazon Redshift, automatic table optimization collects the query statistics that are analyzed using a machine learning service to predict recommendations about the sort and distribution keys. These recommendations are later applied using online ALTER statements into the respective Amazon Redshift tables automatically.
Amazon Redshift takes advantage of the new Automatic table optimization feature to apply the optimization made by the Advisor to the target tables. The conversion is run by the automation during periods of low workload intensity so as to minimize impact on user queries. This can be verified by running the following query:
You can easily convert existing tables for Automatic table optimization using the ALTER table command and switch the sort and distribution styles to AUTO so that it can be automatically optimized by Amazon Redshift. See the following code:
Amazon Redshift continuously learns from workloads, and optimizations are inserted into the svv_alter_table_recomendations. When an optimization is available, it runs within a defined frequency, as well as in periods of low workload intensity, so as to minimize impact on user queries. For more information about the lead time of applying the recommendation, see _Creating_tables.html
Automatic table optimization for Amazon Redshift is a new capability that applies sort and distribution keys without the need for administrator intervention. Using automation to tune the design of tables lets you get started more easily and decreases the amount of administrative effort. Automatic table optimization enables easy management of large numbers of tables in a data warehouse because Amazon Redshift self-learns, self-optimizes, and adapts to your actual workload to deliver you the best possible performance.
To achieve the best possible query performance, data needs to be distributed across the compute nodes in a way that is optimal to the specific workload that is being run on the cluster. For example, the optimal way to distribute data for tables that are commonly joined is to store rows with matching join keys on the same nodes. This enables Amazon Redshift to join the rows locally on each node without having to move data around the nodes. Data distribution also affects the performance of GROUP BY operations.
Both tables have the customer_key column set as the distribution key (DISTKEY). When rows are inserted into these tables, Amazon Redshift distributes them across the cluster based on the values of the customer_key column. For example, all rows with a customer_key of 100 are moved to the same node, and likewise all rows with a customer_key of 101 are also moved to the same node (this may not be the same node as the key of 100), and so on for every row in the table.
For distribution keys, Amazon Redshift constructs a graph representation of the SQL join history, and uses this graph to calculate the optimal table distribution to reduce data transfer across nodes when joining tables (see the following diagram). You can find more details of this process in the scientific paper Fast and Effective Distribution-Key Recommendation for Amazon Redshift.
When an optimal configuration is found, ATO implements the new keys in the background, redistributing rows across the cluster and sorting tables. For sort keys, another Amazon Redshift feature, automatic table sort, handles physically sorting the rows in the table, and maintains the sort order over time.
TPC-H is an industry standard benchmark designed to measure ad hoc query performance for business analytics workloads. It consists of 8 tables and 22 queries designed to simulate a real-world decision support system. For full details of TPC-H, see TPC BENCHMARK H.
The following diagram shows the eight tables in the TPC-H data model. The Cloud DW Benchmark is derived from TPC-H and uses the same set of tables, queries, and a 30 TB dataset in Amazon Simple Storage Service (Amazon S3), which was generated using the official TPC-H data generator. This provides an easy way to set up and run the test on your own Amazon Redshift cluster.
The following script is an untuned version of the TPC-H ddl.sql file that creates all required tables for the test and loads them with the COPY command. To untune the tables, all the sort and distribution keys have been removed. The original tuned version is available on the amazon-redshift-utils GitHub repo.
The output of this query shows some optimizations have already been implemented by the COPY command. The smaller tables have the diststyle set to ALL (replicating all rows across all data nodes), and the larger tables are set to EVEN (a round-robin distribution of rows across the data nodes). Also, the encoding (compression) has been set for all the tables.
This output shows the encoding type set for each column of the customer table. Because Amazon Redshift is a columnar database, the compression can be set differently for each column, as opposed to a row-based database, which can only set compression at the row level.
Amazon Redshift automatically monitors the workload on the cluster and uses AI algorithms to calculate the optimal sort and distribution keys. Then ATO implements the table changes online, without disrupting running queries. 2ff7e9595c
Comments