In this article, we'll have a look at different ways of doing a case-insensitive string comparison in SQLite. For examples in this article, let's suppose we have the following table:
+----+---------------+ | id | title | +----+---------------+ | 1 | Lorem Ipsum | | 2 | Proin ut Ante | | 3 | Fusce vitae | +----+---------------+
CREATE TABLE `blog_post` ( id INTEGER PRIMARY KEY AUTOINCREMENT, title VARCHAR(25) NOT NULL ); INSERT INTO blog_post(`title`) VALUES ('Lorem Ipsum'), ('Proin ut Ante'), ('Fusce vitae');
Case-Insensitive Comparison for Non-English Characters
Before we learn how to do case-insensitive comparisons, it is important to know that by default, SQLite only supports string comparisons for ASCII characters. This means that only case mapping for letters in the English language are supported by default. You could use the SQLite ICU (International Components for Unicode) extension to support case mapping for full range of unicode characters for the following SQLite operations/functions:
- SQL Scalars
UPPER()
andLOWER()
; - Unicode aware
LIKE
operator; - ICU Collation Sequences;
- SQL
REGEXP
Operator.
Case-Insensitive Comparisons Using the COLLATE
Operator
The COLLATE
operator can be used with the built-in NOCASE
collating function to do case-insensitive comparisons, like so:
SELECT * FROM `blog_post` WHERE `title` = 'lorem ipsum' COLLATE NOCASE
The COLLATE
operator can be used with other operators as well. For example, to use it with the IN
operator, we could do the following:
SELECT * FROM `blog_post` WHERE `title` COLLATE NOCASE IN ('lorem ipsum', 'fusce vitae')
Declaring Column as COLLATE NOCASE
:
Alternatively, we could declare the column itself as case-insensitive by adding COLLATE NOCASE
in the CREATE TABLE
column definition like so:
CREATE TABLE `blog_post` ( id INTEGER PRIMARY KEY AUTOINCREMENT, title VARCHAR(25) NOT NULL COLLATE NOCASE );
Now, we won't have to explicitly use COLLATE NOCASE
in our queries. For example, the following query will do a case-insensitive comparison now as the column itself is defined as COLLATE NOCASE
:
SELECT * FROM `blog_post` WHERE `title` = 'lorem ipsum'
Optimizing With Index:
To optimize lookups, we can create an index with COLLATE NOCASE
for the column with case-insensitive strings. For example:
CREATE INDEX `IDX_BLOG_POST_TITLE` ON `blog_post` (`title` COLLATE NOCASE);
Without a COLLATE NOCASE
index, our queries will do a full table scan. To demonstrate this, let's do EXPLAIN QUERY PLAN
on our query:
# without index: EXPLAIN QUERY PLAN SELECT * FROM `blog_post` WHERE `title` = 'lorem ipsum'; # output: SCAN TABLE blog_post
As you can see in the example above, SQLite does a full-table scan and visits all rows in the table when we don't have an index. However, with the index added, we get the following output with EXPLAIN QUERY PLAN
:
# with index: EXPLAIN QUERY PLAN SELECT * FROM `blog_post` WHERE `title` = 'lorem ipsum'; # output: SEARCH TABLE post USING INDEX IDX_BLOG_POST_TITLE (title=?)
As evident from the example above, when COLLATE NOCASE
index is present, the query does not scan all rows.
Case-Insensitive Comparisons With the LIKE
Operator
By default, the SQLite LIKE
operator is case-insensitive for ASCII characters (which covers all english language letters), and case-sensitive for unicode characters that are beyond the ASCII range. Therefore, for English characters we could use LIKE
for case-insensitive comparison like so:
SELECT * FROM `blog_post` WHERE `title` LIKE 'lorem ipsum';
And for non-english characters we could use a unicode-aware LIKE
(for example, from the ICU extension).
Using LIKE
operator may have performance issues, especially for large tables, as SQLite will do a full-scan of the rows unless the column you're using LIKE
on has a COLLATE NOCASE
index.
Making the Compared Strings the Same Case
We could use UPPER()
or LOWER()
function on both sides of string comparison so that we have the same case on either side. For example:
SELECT * FROM `blog_post` WHERE UPPER(`title`) = UPPER('lorem ipsum');
SELECT * FROM `blog_post` WHERE LOWER(`title`) = LOWER('lorem ipsum');
SQLite does not make use of an index when using UPPER()
or LOWER()
functions. Therefore, it ends up doing a full-scan; going through all rows and converting them to upper or lower case (depending on the function you use) before doing the comparison. This would have performance issues, especially for larger tables.
This post was published by Daniyal Hamid. Daniyal currently works as the Head of Engineering in Germany and has 20+ years of experience in software engineering, design and marketing. Please show your love and support by sharing this post.