Support and Documentation

Database indexing

Enterprise websites can have hundreds of thousands of items and millions of words. Finding a particular item can take minutes or even hours. Database indexing reduces the search time to seconds.

Searching a database without an index

A database is usually represented as a table of rows and columns, similar to a spreadsheet.

Table 4. Content

ID

Author

Date Published

Headline

URL

1

Al Falfa

2020-01-25

Galaxies Collide, No Injuries Reported

/galaxies

2

Paige Turner

2020-01-26

Power Outage Makes Soft Ice Cream Softer

/power

3

Al Falfa

2020-01-27

After 200 Years, Astronaut Jane Collins Escapes Black Hole, Doesn't Look One Day Older

/astronaut



Referring to the previous example, the database has a table Content. The table has five columns: ID, Author, Date Published, Headline, and URL. (Brightspot uses a different layout for its databases.)

To find the article written by Paige Turner, a server must examine each row in the table. Searches in short tables such as this are very fast; a Brightspot server can find the article in milliseconds.

However, over time, more and more articles are published. Suppose one year later there are 1,000 articles. Our table now looks like this.

Table 5. Content expanded

ID

Author

Date Published

Headline

URL

1

Al Falfa

2020-01-25

Galaxies Collide, No Injuries Reported

/galaxies

2

Paige Turner

2020-01-26

Power Outage Makes Soft Ice Cream Softer

/power

3

Al Falfa

2020-01-27

After 200 Years, Astronaut Jane Collins Escapes Black Hole, Doesn't Look One Day Older

/astronaut

998

Anne Drew

2021-01-03

Food Critic Takes Revenge on Celebrity Chef

/food

999

Al Falfa

2021-01-03

Buster Jones Loses Flattest Pancake Contest by 3 Millimeters

/pancake

1000

Paige Turner

2021-01-03

Insomniacs Embrace High-Stakes Turtle Racing

/insomniac



To find the articles written by Paige Turner, our server now needs to check all 1,000 articles. As our database continues to grow, the response time to retrieve content that match search criteria becomes longer and longer.

Another factor that increases retrieval times is the complexity of the query. Instead of searching for articles written by Paige Turner, we may want to find all the articles she wrote on January 4, 2021. In this scenario, our server needs to examine each article twice: once to verify the name matches, and another time to verify the date matches.

(Some shopping sites have multiple search criteria. For example, a housewares site can let visitors run a query with the following structure:

  • Department: Kitchen

  • Category: Coffee grinders

  • Manufacturer: Bean Pulverizer, Inc.

  • Price: $30–$60

  • Color: Blizzard Whiteout

In this scenario, the server needs to check every record five times for department, category, manufacturer, price, and color.)

Any modern server can retrieve records from a small database in milliseconds—even retrievals involving complex search conditions. However, with a database containing millions of entries, a retrieval can take several minutes. In these cases, a visitor or editor may think that the site stopped working.

How indexes improve retrieval speeds

One way of improving retrieval times when querying large databases is to add indexes. An index is a separate table in a database that lists where every record is for a given search condition. Referring to the table Content expanded, suppose we want to build an index for the authors.

Table 6. Index by author

Author

Article ID

Anne Drew

998

Al Falfa

1, 3, 999

Paige Turner

2, 1000



To find all the articles written by Paige Turner, all our server needs to do is find her entry in the table Index by author and read her corresponding article IDs. Knowing those article IDs, our server instantly retrieves the exact matching records from the table Content expanded.

A table can have more than one index. Referring to table Content expanded, we can add an index by date published.

Table 7. Index by date published

Date Published

Article ID

2020-01-25

1

2020-01-26

2

2020-01-27

3

2021-01-03

998, 999, 1000



With multiple indexes, we can improve our retrieval times for complex queries. For example, to find the articles written by Paige Turner on January 3, 2021, our server does two lookups in the indexes:

  1. Find the articles written by Paige Turner (IDs 2 and 1000)

  2. Find the articles written on January 3, 2021 (IDs 998, 999, and 1000)

  3. Return the articles common to both lookups (ID 1000)

Lookup strategies involving indexes provide vastly reduced retrieval times compared to examining every item in a table with millions of entries.

Costs of indexing

Suppose Paige writes a new article. To ensure our retrievals are correct, we need to update the index.

Author

Article ID

Anne Drew

998

Al Falfa

1, 3, 999

Paige Turner

2, 1000, 1001

Updating an index is its own process that a server must accomplish. If many authors are publishing many articles, then the server must frequently update the index (in addition to storing the content in the table Content expanded). If there are multiple indexes associated with a table (such as Index by author and Index by date published), the burden of keeping an index updated grows accordingly. Because the server is busy updating indexes, it has fewer resources to perform lookups, and the result is potentially long retrieval times.

Optimizing an indexing strategy

To avoid the situation in which a server is spending more time indexing content compared to actually retrieving it, organizations design an optimized indexing strategy. Some organizations update their indexes only at night or once every hour. Given their enterprise-grade environment, publishers almost always update their indexes immediately, which increases the load on a server.

Given the burden of maintaining an index, an optimized content database indexes fields that are used only in lookups. Referring to table Content, there is a column URL. Searching by URL is very rare, so its column is not indexed; the burden of indexing this column does not contribute any improvement in retrieval times.

Another optimizing strategy is to postpone indexing until server loads are low. The risk of postponing an index update is that a visitor will not see the non-indexed items in a retrieval.

Indexing and database performance are complex issues that have many approaches to achieving optimal performance. Brightspot comes with an indexing strategy that is already optimized for most enterprise publishers. Nevertheless, if your editors or visitors have specialized situations that demand faster response times for their retrievals, contact your Brightspot representative.