ofalkaed 15 hours ago

Surprisingly informative for what is pretty much a press release, learned a good deal about search engines.

  • marginalia_nu 14 hours ago

    (author)

    I'm kinda allergic to writing "I did the thing" posts, so I can't help but tryhard and attempt to make them compelling somehow.

    Writing in this manner is also very helpful in making sense of the work for myself. Takes a better understanding of the subject to thoroughly explain what you've built than to merely build it. Sometimes I've gone back and read through one of these updates to just get a refresher on what my thinking was when I built something.

    • ofalkaed 14 hours ago

      In my experience, that is pretty much what marginalia search is. I rarely get what I expect but I always get something very interesting that makes me understand my expectations better which is very helpful in accomplishing my goals. Thanks for your work, marginalia is probably my favorite little corner of the web.

mariusor 14 hours ago

Off topic, but would there be a way to integrate marginalia with a specific website? Similarly to how people use google search for their forums or how HN uses algolia?

I'm asking this as one of my projects is a link aggregator similar to old reddit (and HN to some extent) and I would like to be able to present to users a search box, but without having to implement document indexing and search. (I assume ad principio that the website is already aligned ethically and technologically with what Marginalia stands for :D)

  • marginalia_nu 13 hours ago

    Should be soon-ish. I'm working right now on laying the ground works for ad-hoc domain filters. That's technically already possible but comes at a too big performance impact that it deteriorates the search results.

    When it works, one of the things I have in mind is making a site search-esque functionality available, as well as exposing it via the public API so that it can be whiteboxed.

reedf1 15 hours ago

Took me too long to realize this wasn't a tool to search for marginalia in scanned manuscripts.

  • iamnothere 5 hours ago

    Hey, at least it isn’t named after a very large number, an excited exclamation, or a sound effect. Surely no product with one of those names would ever succeed.

    • marginalia_nu 3 hours ago

      I probably should have named it cartoon-trombone.wav in retrospect.

juliend2 8 hours ago

I remember asking you for this, so Thank you so much! It works quite well from what I can see.

Small UI issue: on Desktop, the left sidebar should be scrollable, because now on Firefox I can't reach the "Language" menu item in the search results view, unless I zoom-out.

internet_points 14 hours ago

What tools/data do you use for pos-tagging? I'm guessing it has to be fast, to run without a google data center :)

  • marginalia_nu 13 hours ago

    I'm using RDRPosTagger[1], though I've optimized the code a bit so that it's not just algorithmically efficient, but to use the language in a way that is fast. It isn't perfect, but it's good enough to be useful.

    Language detection and sentence splitting are the other two slow bits of processing.

    [1] https://github.com/datquocnguyen/RDRPOSTagger