Partial Search Terms

Viewing 5 reply threads
  • Author
    Posts
  • January 19, 2021 at 10:31 AM #43248

    Doug Bank
    Participant

    I have a a number of ID fields on my forms. They capture the ID number I am assigning to things in my collection, but I also capture IDs that might have been assigned by previous owners or the sellers or on eBay, etc.

    In this case, the ID on the record was JH15813. However, the ID physically printed on the specimen was just 15813. I assumed that I could search on 15813 and it would show me any records that match. However, I get no matches. It only works if I match exactly the same way it is on the record.

    After a bit of experimentation, I figured out that partial searches work as long as I have the beginning of the term correct. So I can find “JH15” but cannot find “15813”.

    Is this a bug or have you just never implemented the search so that you can find word fragments if they are not at the beginning of a word?

    January 19, 2021 at 3:37 PM #43250

    Brendan
    Keymaster

    This is a consequence of me using SQLite’s FTS (full text search) engine. It only works by searching for word prefixes. It won’t work on word suffixes or substrings of words. There’s nothing I can do about it unfortunately as that’s how SQLite was written.

    However, if you put a dash after the text based prefix, then it would work because punctuation is ignored in the search engine. For example JH-15813 would be returned if you just searched for 15813.

    Alternatively you can create a separate Calculation or Script field that takes your ID number and strips out the JH and returns just the numeric part. Then you could search on that.

    January 19, 2021 at 6:46 PM #43251

    Doug Bank
    Participant

    It’s not a great solution, in my opinion, because I have no idea how people will format these numbers, but I created a calculation that takes that field and creates 4 versions.

    CONCAT(RIGHT(ID; LENGTH(ID)-1 ) ;” “;RIGHT(ID; LENGTH(ID)-2) ;” “; RIGHT(ID; LENGTH(ID)-3);” “; RIGHT(ID; LENGTH(ID)-4))

    I figure that probably one of those will match. Then I hid the field so that I don’t have to see it… Ugly, but it seems to work, though only for this one field.

    Not much of a “FULL” text search engine, is it? Sigh.

    Thanks anyway

    January 19, 2021 at 8:36 PM #43253

    Sam Moffatt
    Participant

    It looks like a standard term or term prefix search engine implementation, for text content it’s generally a solid trade off of space storage versus recall. Not so great for stuff that doesn’t tokenise into words like these ID’s though but that isn’t generally regarded as text either.

    What you’re looking for is usually handled by a n-gram tokeniser, generally a trigram (where your minimum three character limit comes from) which can more effectively handle these use cases at the cost of extra storage as you’re seeing with the field you’re adding. In looking up what SQLite did, it looks like they added an experimental trigram tokenizer option in FTS5 though it has it’s own limitations depending on options used (one of which being unable to use terms longer than three characters).

    January 19, 2021 at 8:39 PM #43254

    Brendan
    Keymaster

    It’s always about trade-offs.

    January 20, 2021 at 5:21 PM #43259

    Doug Bank
    Participant

    In this case the tradeoff is inside my head. Or actually, it is in the head of the guy who used different ways of writing the same ID, thus confusing me. Curiously, I had already formatted something like this for my own IDs. The number is generated as a number, but the ID I use takes that number and concatenates with with a prefix and suffix. However, since the number is in a hidden field, I can search on it!

    Thanks everyone.

Viewing 5 reply threads

You must be logged in to reply to this topic.