Friday, February 25, 2022

SBE Repeating Group Gotchas

SBE is the fastest off-the-shelf way to serialize data.

To help support that, it really prefers you to use fixed length fields. If you need an unknown number of submessages, it's a littler harder...
  1. There is very little documentation in general, and for repeating groups it's incomplete. A big omission: you must always explicitly decode every member of every repeating group, in the same order as they are defined in your schema. If you do not do this, your results could be mysteriously corrupted. Alternatively, you could explicitly "skip" the ones you are not interested in, with the relatively new sbeSkip() call. E.g.

           
            weWantSerialNumber(carDecoder.serialNumber());
            weWantModelYear(carDecoder.modelYear());

            FuelFiguresDecoder fuelFigures = fuelFigures(); //we don't want fuelfigures

            if (fuelFigures.count() > 0)                    //but if it was encoded...

            {                                               //we must iterate over it anyway

                while (fuelFigures.hasNext())

                {

                    fuelFigures.next();

                    fuelFigures.sbeSkip();

                }

            }

            PerformanceFiguresDecoder performanceFigures = performanceFigures();

            if (performanceFigures.count() > 0)

            {

                while (performanceFigures.hasNext())

                {

                    performanceFigures.next();

                    weWantOctaneRating(performanceFigures.octaneRating());
                    
    performanceFigures.sbeSkip(); //actually needed in the sample

                }                                 //because there's a nested repeating group!

            }

  2. If you're working with code that does not use repeating groups, and you add one, lots of things might break unexpectedly. Because wrapping a decoder with an incorrect blockLength will mostly work if you don't need to decode repeating groups or variable length data fields. You need to use the blockLength on a header message, which in turn needs to be the BLOCK_LENGTH of the encoder. This excludes the length of the repeating groups. On the bright side, nowadays there are wrapAndApplyHeader() convenience methods on both the encoder and decoder objects. 
  3. If you want to read repeating groups from the same decoder more than once, you need to call sbeRewind().
  4. There's currently a bug in which toString can break repeating groups

Thursday, February 17, 2022

Bookmarks and Full Text Search

Bookmarks suck. They don't have to.

They suck because they don't actually work for their primary use cases:
1. where did I read about that cool thing X? 
2. where was that cool article about Y that I didn't have to time to read, but would like to read now?

Heck, at some point I found it easier to google for pages instead of trying to find them in my bookmarks. 

Other use cases should be handled slightly differently:
3. "Bookmarks Bar", "Pin a tab", and "Save a shortcut" features are all useful for commonly-used pages or web apps.
4. "Manage Search Engines" feature is useful for sites that you want to search often.
4. I'll bet Chrome soon adds support for saving tab groups, which will be useful for gathering lots of pages for a particular project, e.g. a bibliography.

Practically speaking, labels and categorization are annoying to add, and don't really help find things later. The titles that maintainers give to web pages are also often unhelpful for retrieval.

The solution is automatic full text search of all bookmarked pages.

That way, you can retrieve a page by anything that you see on the page, without any additional organizational work.

On a Mac, you can get already get this via Spotlight and print with "Save to PDF", though there's some user interaction involved.

Worldbrain's Memex supports full text search of bookmarks well, and for free. I have mixed feelings about it. On the one hand, this feature is implemented by an open source plugin, and the maintainers support local first software. On the other hand, they have a much bigger vision, and it's unclear how much they want to support the free part of it. It currently sports a non-dismissible reddish prompt to register for the cloud support.

Probably I'm Dunning Krugering, but it just doesn't seem like that difficult of a feature to implement. And it feels like a core service. I wonder how Apple and Google might decide to take on a feature like this. There's no reason to end on a low note. If you don't care about sync-ing across your devices, go ahead and install free Memex on your PC, and on your tablet!