복합 인덱스는 언제 사용해야합니까?
- 데이터베이스에서 복합 인덱스를 언제 사용해야합니까?
- 복합 인덱스를 사용하여 성능에 미치는 영향은 무엇입니까?)
- 왜 복합 인덱스를 사용해야합니까?
예를 들어 homes
테이블이 있습니다.
CREATE TABLE IF NOT EXISTS `homes` (
`home_id` int(10) unsigned NOT NULL auto_increment,
`sqft` smallint(5) unsigned NOT NULL,
`year_built` smallint(5) unsigned NOT NULL,
`geolat` decimal(10,6) default NULL,
`geolng` decimal(10,6) default NULL,
PRIMARY KEY (`home_id`),
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
) ENGINE=InnoDB ;
그것은 나를 모두 복합 인덱스를 사용하는 의미가 있는가 geolat
와 geolng
같은 그 :
나는 다음을 대체합니다.
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
와:
KEY `geolat_geolng` (`geolat`, `geolng`)
그렇다면:
- 왜?
- 복합 인덱스를 사용하여 성능에 미치는 영향은 무엇입니까?)
최신 정보:
많은 사람들이 내가 수행하는 쿼리에 전적으로 의존한다고 말 했으므로 아래는 가장 일반적인 쿼리입니다.
SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
업데이트 2 :
다음 데이터베이스 스키마를 사용하십시오.
CREATE TABLE IF NOT EXISTS `homes` (
`home_id` int(10) unsigned NOT NULL auto_increment,
`primary_photo_group_id` int(10) unsigned NOT NULL default '0',
`customer_id` bigint(20) unsigned NOT NULL,
`account_type_id` int(11) NOT NULL,
`address` varchar(128) collate utf8_unicode_ci NOT NULL,
`city` varchar(64) collate utf8_unicode_ci NOT NULL,
`state` varchar(2) collate utf8_unicode_ci NOT NULL,
`zip` mediumint(8) unsigned NOT NULL,
`price` mediumint(8) unsigned NOT NULL,
`sqft` smallint(5) unsigned NOT NULL,
`year_built` smallint(5) unsigned NOT NULL,
`num_of_beds` tinyint(3) unsigned NOT NULL,
`num_of_baths` decimal(3,1) unsigned NOT NULL,
`num_of_floors` tinyint(3) unsigned NOT NULL,
`description` text collate utf8_unicode_ci,
`geolat` decimal(10,6) default NULL,
`geolng` decimal(10,6) default NULL,
`display_status` tinyint(1) NOT NULL,
`date_listed` timestamp NOT NULL default CURRENT_TIMESTAMP,
`contact_email` varchar(100) collate utf8_unicode_ci NOT NULL,
`contact_phone_number` varchar(15) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`home_id`),
KEY `customer_id` (`customer_id`),
KEY `city` (`city`),
KEY `num_of_beds` (`num_of_beds`),
KEY `num_of_baths` (`num_of_baths`),
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
KEY `account_type_id` (`account_type_id`),
KEY `display_status` (`display_status`),
KEY `sqft` (`sqft`),
KEY `price` (`price`),
KEY `primary_photo_group_id` (`primary_photo_group_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=8 ;
다음 SQL 사용
EXPLAIN SELECT homes.home_id,
address,
city,
state,
zip,
price,
sqft,
year_built,
account_type_id,
num_of_beds,
num_of_baths,
geolat,
geolng,
photo_id,
photo_url_dir
FROM homes
LEFT OUTER JOIN home_photos ON homes.home_id = home_photos.home_id
AND homes.primary_photo_group_id = home_photos.home_photo_group_id
AND home_photos.home_photo_type_id = 2
WHERE homes.display_status = true
AND homes.geolat BETWEEN -100 AND 100
AND homes.geolng BETWEEN -100 AND 100
EXPLAIN은 다음을 반환합니다.
id select_type table type possible_keys key key_len ref rows Extra
----------------------------------------------------------------------------------------------------------
1 SIMPLE homes ref geolat,geolng,display_status display_status 1 const 2 Using where
1 SIMPLE home_photos ref home_id,home_photo_type_id,home_photo_group_id home_photo_group_id 4 homes.primary_photo_group_id 4
EXPLAIN 명령을 읽는 방법을 잘 모르겠습니다. 이것이 좋거나 나빠 보입니까? 현재 geolat 및 geolng에 복합 인덱스를 사용하지 않습니다. 내가해야합니까?
이점이있는 쿼리를 사용할 때는 복합 인덱스를 사용해야합니다. 다음과 같은 복합 인덱스 :
index( column_A, column_B, column_C )
이러한 필드를 사용하여 조인, 필터링 및 때로는 선택하는 쿼리에 도움이됩니다. 또한 해당 컴포지트에서 열의 가장 왼쪽에있는 하위 집합을 사용하는 쿼리에도 도움이됩니다. 따라서 위의 색인은 필요한 쿼리를 만족시킵니다.
index( column_A, column_B, column_C )
index( column_A, column_B )
index( column_A )
그러나 필요한 쿼리에는 도움이되지 않습니다 (적어도 직접적이지는 않지만 더 나은 색인이 없으면 부분적으로 도움이 될 수 있음).
index( column_A, column_C )
column_B가 어떻게 누락되었는지 확인하십시오.
원래 예에서 2 차원의 복합 색인은 대부분의 차원이나 가장 왼쪽의 차원을 쿼리하지만 가장 오른쪽의 차원 자체는 쿼리하지 않는 쿼리에 주로 도움이됩니다. 항상 2 차원을 쿼리하는 경우 복합 색인을 사용하는 것이 가장 중요합니다 (아마도 가장 먼저).
다음과 같은 세 가지 쿼리가 있다고 가정합니다.
쿼리 I :
SELECT * FROM homes WHERE `geolat`=42.9 AND `geolng`=36.4
쿼리 II :
SELECT * FROM homes WHERE `geolat`=42.9
쿼리 III :
SELECT * FROM homes WHERE `geolng`=36.4
열당 별도의 인덱스가있는 경우 세 쿼리 모두 인덱스를 사용합니다. MySQL에서 복합 인덱스 ( geolat
, geolng
)가있는 경우 쿼리 I 및 쿼리 II (복합 인덱스의 첫 번째 부분을 사용) 만 인덱스를 사용합니다. 이 경우 쿼리 III에는 전체 테이블 검색이 필요합니다.
에 다중 열 인덱스의 설명서의 부분, 내가 설명서를 다시 입력하지 않아도 여러 열 인덱스가 어떻게 작동하는지 설명 명확하게된다.
로부터 MySQL을 참조 설명서 페이지 :
다중 컬럼 인덱스는 인덱스 컬럼의 값을 연결하여 작성된 값을 포함하는 정렬 된 배열로 간주 될 수 있습니다 .
If you use seperated index for geolat and geolng columns, you have two different index in your table which you can search independent.
INDEX geolat
-----------
VALUE RRN
36.4 1
36.4 8
36.6 2
37.8 3
37.8 12
41.4 4
INDEX geolng
-----------
VALUE RRN
26.1 1
26.1 8
29.6 2
29.6 3
30.1 12
34.7 4
If you use composite index you have only one index for both columns:
INDEX (geolat, geolng)
-----------
VALUE RRN
36.4,26.1 1
36.4,26.1 8
36.6,29.6 2
37.8,29.6 3
37.8,30.1 12
41.4,34.7 4
RRN is relative record number (to simplify, you can say ID). The first two index generated seperate and the third index is composite. As you can see you can search based on geolng on composite one since it is indexed by geolat, however it's possible to search by geolat or "geolat AND geolng" (since geolng is second level index).
Also, have a look at How MySQL Uses Indexes manual section.
There could be a misconception about what composite index does. Many people think that composite index can be used to optimise a search query as long as the where
clause covers the indexed columns, in your case geolat
and geolng
. Let's delve deeper:
I believe your data on the coordinates of homes would be random decimals as such:
home_id geolat geolng
1 20.1243 50.4521
2 22.6456 51.1564
3 13.5464 45.4562
4 55.5642 166.5756
5 24.2624 27.4564
6 62.1564 24.2542
...
Since geolat
and geolng
values hardly repeat itself. A composite index on geolat
and geolng
would look something like this:
index_id geolat geolng
1 20.1243 50.4521
2 20.1244 61.1564
3 20.1251 55.4562
4 20.1293 66.5756
5 20.1302 57.4564
6 20.1311 54.2542
...
Therefore the second column of the composite index is basically useless! The speed of your query with a composite index is probably going to be similar to an index on just the geolat
column.
As mentioned by Will, MySQL provides spatial extension support. A spatial point is stored in a single column instead of two separate lat
lng
columns. Spatial index can be applied to such a column. However, the efficiency could be overrated based on my personal experience. It could be that spatial index does not resolve the two dimensional problem but merely speed up the search using R-Trees with quadratic splitting.
The trade-off is that a spatial point consumes much more memory as it used eight-byte double-precision numbers for storing coordinates. Correct me if I am wrong.
Composite indexes are useful for
- 0 or more "=" clauses, plus
- at most one range clause.
A composite index cannot handle two ranges. I discuss this further in my index cookbook.
Find nearest -- If the question is really about optimizing
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
then no index can really handle both dimensions.
Instead, one must 'think out of the box'. If one dimension is implemented via partitioning and the other is implemented by carefully picking the PRIMARY KEY
, one can get significantly better efficiency for very large tables of lat/lng lookup. My latlng blog goes into the details of how to implement "find nearest" on the globe. It includes code.
The PARTITIONs
are stripes of latitude ranges. The PRIMARY KEY
deliberately starts with longitude so that the useful rows are likely to be in the same block. A Stored Routine orchestrates the messy code for doing order by... limit...
and for growing the 'square' around the target until you have enough coffee shops (or whatever). It also takes care of the great-circle calculations and handling the dateline and poles.
Composite indexes are very powerful as they:
- Enforce structure integrity
- Enable sorting on a FILTERED id
ENFORCE STRUCTURE INTEGRITY
Composite indexes are not just another type of index; they can provide NECESSARY structure to a table by enforcing integrity as the Primary Key.
Mysql's Innodb supports clustering and the following example illustrates why a composite index may be necessary.
To create a friends' tables (i.e. for a social network) we need 2 columns: user_id, friend_id
.
Table Strcture
user_id (medium_int)
friend_id (medium_int)
Primary Key -> (user_id, friend_id)
By virtue, a Primary Key (PK) is unique and by creating a composite PK, Innodb will automatically check that no duplicates on user_id, friend_id
exists when a new record is added. This is the expected behavior as no user should have more than 1 record (relationship link) with friend_id = 2
for instance.
Without a composite PK, we can create this schema using a surrogate key:
user_friend_id
user_id
friend_id
Primary Key -> (user_friend_id)
Now, whenever a new record is added we will have to check that a prior record with the combination user_id, friend_id
does not already exist.
As such, a composite index can enforce structure integrity.
ENABLE SORTING ON A FILTERED ID
It is very common to sort a set of records by the post's time (timestamp or datetime). Usually, this means posting on a given id. Here is an example
Table User_Wall_Posts (think if Facebook's wall posts)
user_id (medium_int)
timestamp (timestamp)
author_id (medium_int)
comment_post (text)
Primary Key -> (user_id, timestamp, author_id)
We want to query and find all posts for user_id = 10
and sort the comment posts by timestamp
(date).
SQL QUERY
SELECT * FROM User_Wall_Posts WHERE user_id = 10 ORDER BY timestamp DES
The composite PK enables Mysql to filter and sort the results using the index; Mysql will not have to use a temporary file or filesort to fetch the results. Without a composite key, this would not be possible and would cause a very inefficient query.
As such, composite keys are very powerful and suit more than the simple problem of "I want to search for column_a, column_b
so I will use composite keys. For my current database schema, I have just as many composite keys as single keys. Don't overlook a composite key's use!
There is no Black and White, one size fits all answer.
You should use a composite index, when your query work load would benefit from one.
You need to profile your query work load in order to determine this.
A composite index comes into play when queries can be satisfied entirely from that index.
UPDATE (in response to edit to posted question): If you are selecting * from the table the composite index may be used, it may not. You will need to run EXPLAIN PLAN to be sure.
To do spacial searches, you need an R-Tree algorithm, which allows searching geographical areas very quickly. Exactly what you need for this job.
Some databases have spacial indexes built in. A quick Google search shows MySQL 5 has them (which looking at your SQL I'm guessing you're using MySQL).
Composite index can be useful when you want to optimise group by
clause (check this article http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html). Please pay attention:
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (for example, this is a BTREE index and not a HASH index)
I'm with @Mitch, depends entirely your queries. Fortunately you can create and drop indexes at any time, and you can prepend the EXPLAIN keyword to your queries to see if the query analyzer uses the indexes.
If you'll be looking up an exact lat/long pair this index would likely make sense. But you're probably going to be looking for homes within a certain distance of a particular place, so your queries will look something like this (see source):
select *, sqrt( pow(h2.geolat - h1.geolat, 2)
+ pow(h2.geolng - h1.geolng, 2) ) as distance
from homes h1, homes h2
where h1.home_id = 12345 and h2.home_id != h1.home_id
order by distance
and the index very likely won't be helpful at all. For geospatial queries, you need something like this.
Update: with this query:
SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
The query analyzer could use an index on geolat alone, or an index on geolng alone, or possibly both indexes. I don't think it would use a composite index. But it's easy to try out each of these permutations on a real data set and then (a) see what EXPLAIN tells you and (b) measure the time the query really takes.
참고URL : https://stackoverflow.com/questions/1823685/when-should-i-use-a-composite-index
'programing tip' 카테고리의 다른 글
Mockito와 JMockit의 비교-Mockito가 JMockit보다 더 나은 투표를하는 이유는 무엇입니까? (0) | 2020.07.14 |
---|---|
유성 테스트 주도 개발 (0) | 2020.07.14 |
그리드 뷰 높이가 잘림 (0) | 2020.07.13 |
힘내 : 치명적 : Pathspec이 서브 모듈에 있습니다 (0) | 2020.07.13 |
VI에서 HTML 파일의 들여 쓰기를 어떻게 정리합니까? (0) | 2020.07.13 |