ProgrammingBackend Developer

How to implement efficient full-text search in SQL? What mechanisms are available for full-text search and what should be considered when working with large volumes of text data?

Pass interviews with Hintsage AI assistant

Answer.

Background of the question:
Initially, SQL was mainly used for working with structured data, where searching in text fields was limited to simple operations like LIKE. With the growth of text information, there arose a need to quickly and flexibly search through large texts: articles, messages, blogs, etc.

Problem:
Standard SQL tools (LIKE/ILIKE) perform poorly with large volumes of text and cannot effectively find words based on relevance, taking into account morphology or the distance between words. This can lead to performance loss and excessively long search responses.

Solution:
For such tasks, full-text search mechanisms (Full-Text Search, FTS) built into DBMS are used, such as Full-Text Index and special operators (CONTAINS, MATCH AGAINST, tsvector, tsquery). These indexes create a "word card" ("inverted index"), speeding up text searches by tens of times.

Example code (SQL Server):

CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT; CREATE FULLTEXT INDEX ON Documents(Content) KEY INDEX PK_Documents; SELECT * FROM Documents WHERE CONTAINS(Content, '"SQL programming"');

Key features:

  • Operates on the basis of special full-text indexes, separate from regular ones.
  • Supports relevance queries, lemmatization, stop-word recognition, and complex conditions (NOT, OR, proximity).
  • Requires index maintenance when there are massive data changes — periodic re-indexing.

Trick questions.

What is the difference between searching with LIKE and full-text search?

LIKE is a simple comparison operation with a pattern that does not use text indexes, and is slow for large volumes. Full-text uses a special index and can take into account morphology and relevance.

Example:

SELECT * FROM articles WHERE body LIKE '%database%'; -- slow, no ranking SELECT * FROM articles WHERE MATCH(body) AGAINST ('database'); -- fast, with ranking

What happens to the full-text index during massive inserts or deletions?

After bulk changes to text fields, the index becomes outdated (sometimes auto-updating, sometimes manually), and re-indexing is needed to restore performance.

-- For MSSQL ALTER FULLTEXT INDEX ON Documents START FULL POPULATION;

Can full-text indexes be used to search in JSON or XML type columns?

No, most full-text engines do not have direct support for JSON/XML structures; such data needs to be either extracted to a string field or special parsers/external tools (like Elasticsearch) should be used.

Common mistakes and anti-patterns

  • Using the LIKE '%word%' operator on large tables — catastrophic performance
  • Re-indexing is not performed, search becomes irrelevant
  • Language features and stop-words are not taken into account
  • Indexing several gigabytes of data without additional resources

Real-life example

Negative case

A company stored tens of millions of article records. LIKE '%word%' was used for searching. The IT department complained about regular timeouts, users waited over 10 minutes for results.

Pros:

  • No need for additional licenses or configurations
  • Simple implementation

Cons:

  • Poor performance, especially at large volumes
  • Unrealistic response times from the system
  • Incorrect search results (no word form consideration)

Positive case

Implemented Full-Text Search (FULLTEXT INDEX in MySQL). Searching became up to 100 times faster, the ability to search for "similar" words and phrases was added, along with ranking.

Pros:

  • Instant search
  • Relevant results, support for morphology
  • Scalability

Cons:

  • Resources needed for index maintenance
  • Index is created on string fields, does not work for nested structures