Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

> How SQL server indexing work

views
     
TSnarf03
post Sep 17 2020, 12:32 AM

Look at all my stars!!
*******
Senior Member
4,058 posts

Joined: Dec 2004
From: Metro Prima, Kuala Lumpur, Malaysia, Earth, Sol


QUOTE(malleus @ Sep 16 2020, 08:21 PM)
this may not be applicable to your current scenario, as what you have got planned is likely already fixed for, and is too late to make changes to.

but if you're talking about data sizes of above 100 million rows (or your initial example of possibly above 1 billion rows), then maybe it's time to consider alternatives to a RDBMS. but it's also difficult to determine the suitability without knowing the exact use case for that table you're referring to.
*
did simple calculation based on 20k inserts, it takes ~4 seconds to generate just 20k rows(for 2 tables), for 100m rows, it need over half a day, for 1 bil rows, i dont really have the time to wait.

so im generating a total of 200m rows(100m in each table), 1 table only has groupid as index and the other table has groupid+status as index

This post has been edited by narf03: Sep 17 2020, 12:33 AM
kevinlim001
post Sep 17 2020, 09:57 AM

Ethical Hacker
*******
Senior Member
5,290 posts

Joined: May 2006
From: Planet called "EARTH"



QUOTE(narf03 @ Sep 15 2020, 09:45 PM)
I know it seems like a noob question, please try to understand then only comment, thanks.

Lets say you have a table with 1000 million rows, selecting anything without index will be slow, and too many index will waste space, so minimum index should be the way to go, im trying to achieve that.

Lets say the table has many columns and we will just use 3 of them for out discussion here.

ID, integer, primary key
GroupID, integer, many duplicates, indexed
Status, integer, only 0 and 1, indicate enable and disable(just assume)

If i want to quickly search for rows that has the same group, i will need to index GroupID, i wonder if that index will help if i search GroupID and Status at the same time ? Will the SQL server smart enough to use that index to filter GroupID then only use the resultset to filter status ?
And how much is the different of performance if i have another index that uses both GroupID and Status ?
*
assuming you are using mssql, you can use the query analyzer to check your "IF" doubts..

write two select statement and see the load between two statements.


You can try to select GroupID and Status filtered together OR you can try select ID where GroupID = X then select status from the short listed IDs.

Also, you can try to create a non clustered index for both groupid and status and see if that improves.
bumpo
post Sep 17 2020, 11:04 AM

On my way
****
Junior Member
556 posts

Joined: Mar 2013


QUOTE(narf03 @ Sep 15 2020, 09:45 PM)
I know it seems like a noob question, please try to understand then only comment, thanks.

Lets say you have a table with 1000 million rows, selecting anything without index will be slow, and too many index will waste space, so minimum index should be the way to go, im trying to achieve that.

Lets say the table has many columns and we will just use 3 of them for out discussion here.

ID, integer, primary key
GroupID, integer, many duplicates, indexed
Status, integer, only 0 and 1, indicate enable and disable(just assume)

If i want to quickly search for rows that has the same group, i will need to index GroupID, i wonder if that index will help if i search GroupID and Status at the same time ? Will the SQL server smart enough to use that index to filter GroupID then only use the resultset to filter status ?
And how much is the different of performance if i have another index that uses both GroupID and Status ?
*
this depends on how you write your query

lets assume ordering is important e.g. for pagination, pulling rows 10000 to 10010.

SELECTz x.id, x.groupid, x.status, x.blabla FROM bigtable x where x.groupid = '1' and x.status = 1 and x.blabla = 'z' order by x.id LIMIT 10000, 10

mysql for one does not filter using index first then only use the filtered resultset for blabla. thus this will be slower the bigger the data set goes

tweaking the query a bit like below will trick mysql into the exact behavior that you want, filter using index then use filtered resultset for non index

SELECTz x.id, x.groupid, x.status, x.blabla FROM
(selectz id from bigtable order by id LIMIT 10000, 10) forceindex
JOIN bigtable x
ON x.id = forceindex.id


*SELECTz = SELECT.. .else cloudfare dont allow posting.. doh.gif

This post has been edited by bumpo: Sep 17 2020, 11:04 AM
TSnarf03
post Sep 17 2020, 06:33 PM

Look at all my stars!!
*******
Senior Member
4,058 posts

Joined: Dec 2004
From: Metro Prima, Kuala Lumpur, Malaysia, Earth, Sol


QUOTE(bumpo @ Sep 17 2020, 11:04 AM)
this depends on how you write your query

lets assume ordering is important e.g. for pagination, pulling rows 10000 to 10010.

SELECTz x.id, x.groupid, x.status, x.blabla FROM bigtable x where x.groupid = '1' and x.status = 1 and x.blabla = 'z' order by x.id LIMIT 10000, 10

mysql for one does not filter using index first then only use the filtered resultset for blabla. thus this will be slower the bigger the data set goes

tweaking the query a bit like below will trick mysql into the exact behavior that you want, filter using index then use filtered resultset for non index

SELECTz x.id, x.groupid, x.status, x.blabla FROM
(selectz id from bigtable order by id LIMIT 10000, 10) forceindex
JOIN bigtable x
ON x.id = forceindex.id
*SELECTz = SELECT.. .else cloudfare dont allow posting..  doh.gif
*
Good idea, I'll keep that in mind, tnx

 

Change to:
| Lo-Fi Version
0.0163sec    0.98    5 queries    GZIP Disabled
Time is now: 28th September 2020 - 05:23 PM