Background of the question:
Initially, SQL was mainly used for working with structured data, where searching in text fields was limited to simple operations like LIKE. With the growth of text information, there arose a need to quickly and flexibly search through large texts: articles, messages, blogs, etc.
Problem:
Standard SQL tools (LIKE/ILIKE) perform poorly with large volumes of text and cannot effectively find words based on relevance, taking into account morphology or the distance between words. This can lead to performance loss and excessively long search responses.
Solution:
For such tasks, full-text search mechanisms (Full-Text Search, FTS) built into DBMS are used, such as Full-Text Index and special operators (CONTAINS, MATCH AGAINST, tsvector, tsquery). These indexes create a "word card" ("inverted index"), speeding up text searches by tens of times.
Example code (SQL Server):
CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT; CREATE FULLTEXT INDEX ON Documents(Content) KEY INDEX PK_Documents; SELECT * FROM Documents WHERE CONTAINS(Content, '"SQL programming"');
Key features:
What is the difference between searching with LIKE and full-text search?
LIKE is a simple comparison operation with a pattern that does not use text indexes, and is slow for large volumes. Full-text uses a special index and can take into account morphology and relevance.
Example:
SELECT * FROM articles WHERE body LIKE '%database%'; -- slow, no ranking SELECT * FROM articles WHERE MATCH(body) AGAINST ('database'); -- fast, with ranking
What happens to the full-text index during massive inserts or deletions?
After bulk changes to text fields, the index becomes outdated (sometimes auto-updating, sometimes manually), and re-indexing is needed to restore performance.
-- For MSSQL ALTER FULLTEXT INDEX ON Documents START FULL POPULATION;
Can full-text indexes be used to search in JSON or XML type columns?
No, most full-text engines do not have direct support for JSON/XML structures; such data needs to be either extracted to a string field or special parsers/external tools (like Elasticsearch) should be used.
LIKE '%word%' operator on large tables — catastrophic performanceA company stored tens of millions of article records. LIKE '%word%' was used for searching. The IT department complained about regular timeouts, users waited over 10 minutes for results.
Pros:
Cons:
Implemented Full-Text Search (FULLTEXT INDEX in MySQL). Searching became up to 100 times faster, the ability to search for "similar" words and phrases was added, along with ranking.
Pros:
Cons: