Thursday | 28 MAR 2024
[ previous ]
[ next ]

Notes on Solr

Title:
Date: 2022-11-21
Tags:  

Solr is a java application that provides a search API that you program against. It uses the Lucene library under the hood and it is pretty straightforward to get started with.

A thorough tutorial can be found here:

https://github.com/hectorcorrea/solr-for-newbies/blob/main/tutorial.md

This will be my barebones notes to get started quickly.

A very basic set up requires having java installed and then downloading solr.

curl http://archive.apache.org/dist/lucene/solr/8.9.0/solr-8.9.0.zip > solr.zip

Next unzip it:

unzip solr.zip

Once unzipped, solr is ready to be used.

Start solr:

cd solr-8.9.0/
bin/solr start

This should solr and it is accessible at localhost:8983.

Verify that solr is up:

bin/solr status

{
    "solr_home":"/home/nivethan/bp/solr-8.9.0/server/solr",
    "startTime":"2022-11-22T01:10:28.391Z",
    "uptime":"0 days, 1 hours, 17 minutes, 0 seconds",
    "memory":"206.4 MB (%40.3) of 512 MB"installed
}

Now we can create our first collection also know as a core:

bin/solr create -c testdata

This will create the testdata collection.

Now create a json file of the documents to index. There is some magic involved here such as using txt_en to signify what to index and how to do it. Id is also a field that you should set up.

Put this in test.json.

[
    { "id": 1, "blog_txt_en": "All the content to index" },
    { "id": 2, "blog_txt_en": "Another document" },
    { "id": 2, "blog_txt_en": "Test document" }
]

Now we need to add these documents to the collection:

bin/post -c testdata test.json

With that solr is ready to be used to search!

The below command is how we would do the search. Notice that we need to specify the field to search on here.

curl 'http://localhost:8983/solr/testdata/select?q=blog_txt_en:"Test"'

We should see the following output:

{
    "responseHeader": {
        "status":0,
        "QTime":2,
        "params": {
            "q":"blog_txt_en:\"Test\""
        }
    },
    "response": {
        "numFound":1,
        "start":0,
        "numFoundExact":true,
        "docs":[{
            "id":"2",
            "blog_txt_en":"Test document",
        }]
    }
}

Now we have a working copy of solr.

A future goal is to add the ability to search all the fields and to search without specifying the field. I'm not sure why the most common use case isn't actually easily answered. I did some light googling but it looks to be more difficult that I expected.