Optimization SQL Statements

確定SELECT字段而不是SELECT *

如果表具有許多字段和許多行,則這將查詢大量不需要的數據。

Inefficient:

SELECT * FROM Customers

Efficient:

SELECT FirstName, LastName, Address FROM Customers

選擇更多字段以避免SELECT DISTINCT

SELECT DISTINCT是從查詢中去除重複項的便利方法,它通過對查詢中的所有字段進行GROUP來創建不同的結果。然而,為了實現該目標,需要大量的處理能力,並且數據可能會分組到不準確的程度。

Inefficient and inaccurate:

SELECT DISTINCT FirstName, LastName, Address FROM Customers

利用添加更多的字段,無需使用SELECT DISTINCT即可返回未重複的記錄。

Efficient and accurate:

SELECT FirstNanem, LastName, Address, City, State FROM Customers

使用INNER JOIN而不是WHERE創建關聯

此種join又稱作CROSS JOIN,將會產生A表格 * B表格平方總數的數據

假設,Customers有1000筆記錄,Sales有1000筆記錄,使用CROSS JOIN將產生1,000,000記錄量。

Bad:

SELECT Customers.CustomerID, Customers.Name, Sales.LastSaleDate
FROM Customers, Sales
WHERE Customers.CustomerID = Sales.CustomerID

應改用INNER JOIN,僅生成A表格相等的所需記錄。

Good:

SELECT Customers.CustomerID, Customers.Name, Sales.LasterSaleDate
FROM Customers
INNER JOIN Sales ON Customers.CustomerID = Sales.CustomerID

使用WHERE而不是HAVING條件查詢

與上述概念類似,有效的查詢,目標是僅從數據庫中提取所需的記錄。

根據SQL操作順序,在WHERE語句之後才計算HAVING語句。如果母的是根據條件過濾查詢,則WHERE會更有效。

例如,假設在2016年實現了200筆銷售,我們要查詢2016年每位客戶的銷售數量:

SELECT Customers.CustomerID, Customers.Name, COUNT(Sales.SalesID)
FROM Customers
INNER JOIN Sales ON Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.CustomerID, Customers.Name
HAVING Sales.LastSaleDate BETWEEN #1/1/2016# AND #12/31/2016#

該方法從Sales表中提取1000條銷售幾率,然後過濾2016年生成的200條記錄,最後對數據進行計數。

使用WHERE限制提取的記錄數量:

SELECT Customers.CustomerID, Customers.Name, COUNT(Sales.SaleID)
FROM Customers
INNER JOIN Sales ON Customers.CustomerID = Sales.CustomerID
WHERE Sales.LastSaleDate BETWEEN #1/1/2016# AND #12/31/2016#
GROUP BY Customers.CustomerID, Customers.Name

該方法僅提取符合WHERE條件的200筆記錄,然後對數據進行計數。HAVING子句中的第一步已被消除。

僅當在聚合字段(aggregated field)進行過濾時,才應使用HAVING。例如,使用HAVING為銷售額大於5的客戶進行過濾:

SELECT Customers.CustomerID, Customers.Name, COUNT(Sales.SalesID)
FROM Customers
INNER JOIN Sales ON Customers.CustomerID = Sales.CustomerID
WHERE Sales.LastSaleDate BETWEEN #1/1/2016# AND #12/31/2016#
GROUP BY Customers.CustomerID, Customers.Name
HAVING COUNT(Sales.Sales) > 5

僅使用右模糊查詢

SELECT City FROM Customers
WHERE City LIKE '%Char%'

Results: Charleston, Charlotte, and Charlton. Also, Cape Charles, Crab Orchard, and Richardson.

更好的查詢:

SELECT City FROM Customers
WHERE City LIKE 'Char%'

Results: Charleston, Charlotte, and Charlton.

使用LIMIT返回指定記錄數

SELECT Customers.CustomerID, Customers.Name, COUNT(Sales.SalesID)
FROM Customers
INNER JOIN Sales ON Customers.CustomerID = Sales.CustomerID
WHERE Sales.LastSaleDate BETWEEN #1/1/2016# AND #12/31/2016#
GROUP BY Customers.CustomerID, Customers.Name
LIMIT 10

在某些DBMS中,TOPLIMIT可以交互使用。

Avoiding Full Table Scans

Avoiding Tables Scans

A table scan is the reading of every row in a table and is caused by queries that don’t properly use indexes. Table scans on large tables take an excessive amount of time and cause performance problems.

Make sure that, for any queries against large tables, at least one WHERE clause condition:

  • refers to an indexed column and
  • is reasonable selective

You should be concerned primarily with queries against large tables. If you have a table with a few hundred rows, table scans are not a problem and are sometimes faster than indexed access.

Ref: Avoiding Table Scans - ORACLE

Queries with NULL Conditions

【具有NULL條件的查詢】

Oracle無法使用索引選擇NULL列值,因為NULL未儲存在索引中。以下示例將調用全表掃描:

select
   emp_name
from
   emp
where
   middle_name IS NULL;

通過約定一些特殊含義(例如:N/A-1)來代替NULL,避免全表掃描:

update
   emp
set
   middle_name to N/A
where
   middle_name IS NULL;
select
   emp_name
from
   emp
where
   middle_name = N/A;


MySQL:

If a WHERE clause includes a col_name IS NULL condition for a column that is declared as NOT NULL, that expression is optimized away. This optimization does not occur in cases when the column might produce NULL anyway (for example, if it comes from a table on the right side of a LEFT JOIN).

如果WHERE子句包含聲明為NOT NULL的列的col_name IS NULL條件,則該表達式將被優化。 如果列仍然可能產生NULL(例如,如果它來自LEFT JOIN右側的表),則不會進行此優化。

Ref: MySQL :: MySQL 8.0 Reference Manual :: 8.2.1.15 IS NULL Optimization


結論:將欄位宣告NOT NULL,避免出現NULL值。

Queries Against Unindexed Columns

【針對未索引列的查詢】

可以創建索引來提高查詢效能。


OR語句使用不當引起全表掃描】

WHERE子句中比較的兩個條件,一個有索引,一個沒索引,使用OR則會引起全表掃描。例如:

WHERE A =: 1 OR B =: 2

A有索引,B沒有索引,則比較B =: 2會重新開始全表掃描。

Ref: 会引起全表扫描的十种SQL语句_sql_码上笔记-CSDN博客

Queries with Like Conditions

【具有Like條件的查詢】

Queries that use the like clause will invoke a full-table scan if the percent sign mask is used in the leading side of the query. For example, the following clause would not cause a full-table scan, because the like mask begins with characters and the existing index will be able to service the query.

如果在查詢的開頭使用%,則使用like子句的查詢將調用全表掃描。例如,以下子句不會引起全表掃描,因為like查詢以字符(characters )開頭,並且現有索引將能夠為查詢提供服務。

select
   ename
   job,
   hiredate
from
   emp
where
   ename like BURLE%;

However, we run into full-table scans when the like mask has the percent sign in the beginning of the mask:

但是,當like查詢的開頭具有%時,我們將進行全表掃描:

select
   ename
   job,
   hiredate
from
   emp
where
   ename like %SON;

結論:右模糊會使用索引,左模糊無法使用索引。

可以使用reverse + function index的形式,變化成LIKE '...%'

create index
   ename_reverse_idx
on emp
( reverse(ename) );
select
   ename
   job,
   hiredate
from
   emp
where
   reverse(ename) like NOS%;

Queries with a Not Equals Condition

【條件不等於的查詢】

可以在Oracle SQL使用以下三種NOT EQUALS條件查詢:

  • select ename from emp where job <> 'boss'
  • select ename from emp where job != 'boss'
  • select ename from emp where job not in ('boss')

然而,SQL中不等於操作符會限制索引,引起全表掃描,即使比較的字段上有索引。

解決辦法:通過把NOT EQUALS改成OR,可以使用索引,避免全表掃描。例如:

column != 'aaa'

改成

column < 'aaa' OR column > 'aaa'

Ref: 会引起全表扫描的十种SQL语句_sql_码上笔记-CSDN博客

結論

這是其他

  1. 正確使用索引(index)
  2. 欄位NOT NULL,並給出預設值
  3. 使用正向且精確的查詢,代替反向查詢
  4. 使用UNION代替OR
  5. 盡量不在WHERE函數運算
  6. JOIN查詢比Sub-Query快,但Sub-Query具有更好的閱讀可能性

子查詢 Performance:

Reference

Optimization:

Full Table Scans:

SQL Optimize Monitoring