Posts Tagged ‘postgresql’

Ignoring diacriticals while searching in PostgreSQL

Thursday, November 13th, 2008

As mentioned before, I’m working with BaseTen for access to postgres. One thing my client wants is the ability to just enter a search term and find most anything (Googly style database search). This isn’t so hard, you just take any string and do case insensitive like searches on the most likely fields in the database in a big OR statement.

BaseTen builds queries out of NSPredicates and NSPredicate supports an operator ‘like[cd]‘ where the c option is to ignore case and the d option is to ignore diacriticals (accents and other funny squiggles found in non-english text). Postgres does not offer an option to ignore diacriticals in search. So how to support this?

The good news is that the unicode characters are laid out such that taking the numeric value and taking the modulo of 128 (the size of the ascii table) results in the stripped version.

I wrote a PgPL/SQL function that converts every character in a string to its ascii equivalent, then compare those.

CREATE OR REPLACE FUNCTION asciify(unicode text) RETURNS text AS $$
DECLARE
    translated text;
BEGIN
    translated := '';
    FOR i in 1..(char_length(unicode)) LOOP
        translated := translated || chr(ascii(substring(unicode,i,1))%128);
    END LOOP;
    RETURN translated;
END;
$$ LANGUAGE plpgsql;

The asciify’d versions of Panama and Panamá will end up being Panama and I can do searches like

SELECT name from countries where asciify(name) ~* asciify(?);

and get the equivalent of a case insensitive and diacritical insensitive ‘like’.