Weaviate vector database - Ruby on rails integration

Have you ever wondered how to build an image search engine (like Google Images, Myntra fashion products search, etc,.)? In this blog post, we will create a simple image search feature in the Ruby on Rails application using the Weaviate vector database.

The modern approach to implementing image search involves vector embeddings. Leveraging the magic of neural networks and vector databases, we’ll explore how to realize vector-based image searching.

The popularity of vector search databases has skyrocketed recently, especially for vector conversion, storage, and retrieval tasks. In this blog post, we will specifically explore one such database called Weaviate, which offers neural network models like Resnet-50(embedding model) for vectorization.


What is a vector database?

Type of database, that stores date as a high-dimensional value. Working with vector embeddings is more complex and the traditional databases can’t keep up with it for providing insights and real-time analysis with the data. That’s where vector DBs come into play, these are designed for handling this type of data and offer the performance, scalability, and flexibility you need to make the most out of your data.

Vector database highlevel workflow diagram

The flow will be like this, first, we use the embedding model to create vectors from the content, next the vector embeddings of the content are inserted into the vector database, with some reference to the original content. Now using an application we will interact with the vector DB via the embedding model, on issuing a query we use the same embedding model to create embeddings for the query and use those to query the database for similar vector embeddings.

vector database

Our primary goal is to construct a sophisticated fashion product search system, relying on image-based queries. To achieve this, we will utilize the Myntra fashion products dataset from Kaggle. Be sure to download the dataset from the provided link, as it is essential for building our image search application within the Ruby on Rails environment.

Now, let’s delve into the high-level steps we will be following:

  1. Implement the FashionProduct entity with active storage attachment (product_image)
  2. Import the dataset from Kaggle into the fashion_products table of our application.
  3. Next, we will set up Weaviate using Docker and integrate the Weaviate to ROR application using weaviate-ruby gem.
  4. Create a FashionProduct class in the Weaviate client and upload the FashionProduct records from PostgreSQL to Weaviate DB.
  5. Finally, we will develop a user interface with image search functionality, harnessing the powerful capabilities of the Weaviate Client’s query API.

Let’s start with creating a new Rails application (used Rails 7.0.6 and Ruby 3.2.2 for this project). To ensure smooth development and deployment, we will employ Docker for managing the application and database environments (refer to the current blog’s GitHub repo for local setup).

rails new fashion_products_vdb --database postgresql

Initialize ActiveStorage

To initialize ActiveStorage for the project, execute the following commands in your Ruby on Rails application:

rails active_storage:install
rails db:migrate

These commands will create the necessary tables for ActiveStorage attachments. Since we are using a local Docker setup for ActiveStorage attachments, no additional cloud providers’ setup is required.

Fashion Product entity creation

Let’s handle the FashionProduct entity. Begin by creating a model migration for the FashionProduct entity using rails g model FashionProduct command:

After generating the migration file, modify both the migration and model files with the code below.

# db/migrate/<timestamp>_create_fashion_products.rb
class CreateFashionProducts < ActiveRecord::Migration[7.0]
    def change  
        create_table :fashion_products do |t| 
            # Product ID from the dataset 
            t.integer :p_id 
            # Metadata of the products from the dataset
            t.string :gender  
            t.string :master_category      
            t.string :sub_category  
            t.string :article_type  
            t.string :base_colour  
            t.string :name  
            t.string :usage  

            t.timestamps  
        end  
    end
end

# app/models/fashion_product.rb
class FashionProduct < ApplicationRecord
  has_one_attached :product_image
end

The FashionProduct entity should now have one ActiveStorage attachment, has_one_attached :product_image, to link and attach the image data from the dataset.

Migrate the fashion products data from dataset

To accomplish this, you’ll need to set up a temporary volume for your application’s Docker service. Include the following configuration in your Docker Compose file:

fashion_products_vdb-web:  
    ...
    volumes:  
    - .:/fashion_products_vdb  
    - ./dataset:/dataset # Use this volume setup one time for the image dataset import process  
    ...

Place the downloaded dataset (images directory and styles.csv file) into the dataset folder created under your project’s root folder fashion_products_vdb/dataset. Afterward, restart the server, and the files under the dataset folder will be present inside the Docker container (through volume configuration).

Following that, create a service class that process importing the data from the dataset folder to the application’s database.

# app/services/import_fashion_product_data_service.rb

require 'csv'

class ImportFashionProductDataService
  def initialize(dataset_path, metadata_file_name, image_dir)
    @dataset_path = dataset_path
    @csv_metadata_path = File.join(@dataset_path, metadata_file_name)
    @image_dir = File.join(@dataset_path, image_dir)
  end

  def call
    process_csv_data_import
  end

  private

  def process_csv_data_import
    line_number = 0

    begin
      CSV.foreach(@csv_metadata_path, headers: true) do |row_data|
        create_fashion_prd_from_metadata(format_metadata(row_data))
        line_number += 1
      end
    rescue StandardError => e
      puts "Error parsing CSV at line #{line_number}: #{e.message}"
    end
  end

  # Create a new record with the image from CSV metadata
  def create_fashion_prd_from_metadata(metadata)
    fsprd = create_fashion_prd_with(attributes: metadata)

    # Image attachment process
    image_file  = File.join(@image_dir, fsprd.p_id.to_s + '.jpg')
    puts image_file
    return unless File.exist?(image_file)

    fsprd.product_image.attach(io: File.open(image_file), filename: File.basename(image_file))
    fsprd.save
    puts "#{fsprd.name} created successfully."
  end

  def format_metadata(row_data)
    metadata = row_data.to_hash
    metadata = metadata.transform_keys do |k|
      case k.to_s
      when 'productDisplayName'
        'name'
      when 'id'
        'p_id'
      else
        k.to_s.underscore
      end
    end
    metadata
  end

  def create_fashion_prd_with(attributes:)
    FashionProduct.new.tap do |record|
      attributes.each do |k, v|
        next unless record.respond_to?(k + '=')

        record.send(k + '=', v)
      end
    end
  end
end

The above service class will loop over all the rows inside the styles.csv file from the dataset which contains the product metadata and takes the corresponding image from the images folder (via the product’s id in the CSV file) and creates the FashionProduct entry with those data.

Open the rails console and call the service class instance’s call method once to import the data to the application’s database.

docker-compose exec fashion_products_vdb-web bash
=> rails console
ImportFashionProductDataService.new('\dataset', 'styles.csv', 'images')

Weaviate Integration

For the Weaviate client setup, we can use Weaviate’s docker-compose configurator to generate a docker-compose.yml file for our specific needs. We are going to use image-to-vector conversion so use the below config and download the docker-compose file.

weaviate docer-compose config generation

The docker-compose file will contain two services, the weaviate service (the VDB and Weavite APIs service) and the image-to-vector neural network service (resnet50 pytorch). Since we are using the docker for the ruby application as well combine all the services under one docker-compose file like,

---  
version: '3.4'  
services:  
    weaviate:  
        command:  
        - --host  
        - 0.0.0.0  
        - --port  
        - '8080'  
        - --scheme  
        - http  
        image: semitechnologies/weaviate:1.19.11  
        ports:  
        - 8080:8080  
        restart: on-failure:0  
        volumes:  
            - /var/weaviate:/var/lib/weaviate  
        environment:  
            IMAGE_INFERENCE_API: 'http://i2v-neural:8080'  
            QUERY_DEFAULTS_LIMIT: 25  
            AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'  
            PERSISTENCE_DATA_PATH: '/var/lib/weaviate'  
            DEFAULT_VECTORIZER_MODULE: 'img2vec-neural'  
            ENABLE_MODULES: 'img2vec-neural'  
            CLUSTER_HOSTNAME: 'node1'  
    i2v-neural:  
        image: semitechnologies/img2vec-pytorch:resnet50  
        environment:  
            ENABLE_CUDA: '0'  
    database:  
        image: postgres  
        container_name: database  
        env_file:  
        - .env  
        volumes:  
        - ./tmp/db:/var/lib/postgresql/data  
        ports:  
        - 6000:5432  
    fashion_products_vdb-web:  
        container_name: fashion_products_vdb-web  
        build: .  
        depends_on:  
        - database  
        - weaviate  
        - i2v-neural  
        env_file:  
        - .env  
        command: bash -c "bundle && rm -f /fashion_products_vdb/tmp/pids/server.pid && rails db:prepare && rails server -b 0.0.0.0"  
        volumes:  
        - .:/fashion_products_vdb  
        # - ./dataset:/dataset # Use this volume setup one time for the image dataset import process  
        ports:  
        - 3012:3000  
        tty: true  
        stdin_open: true  
...

In addition, by default the weaviate service does not contain any volumes option, we are adding the /var/weaviate:/var/lib/weaviate volume to persist the weaviate changes even after the docker container restart.

After the docker-compose file modifications run the docker-compose up command, it should download the images for the weaviate dependent services (note: the image sizes will be around 7GB for the resnet-50 image)

Add the weaviate-ruby gem to the project’s Gemfile,

# Weaviate.io API ruby wrapper  
gem "weaviate-ruby"

Create a library class to create Weaviate client instance for our application,

# lib/weaviate_client.rb

require "weaviate"

class WeaviateClient
  # Creates a new WeaviateClient instance with the specified configuration
  def self.create_client
    Weaviate::Client.new(
      url: "http://weaviate:8080"             # Use ENV variables
    )
  end
end

Since we are using a local docker setup we can use port 8080 with Weavite host for the client communication. Weaviate also provides cloud instances for which we need to generate API keys for client communication.

Add lib path to the application.rb -> autoload_path config config.autoload_paths += %W(#{config.root}/lib) to auto-load the class file to use in other parts of the application.

Create a FashionProduct class schema using rails migration that will hold our FashionProduct product_image vectors and ID.

rails g migration weaviate_create_fashion_product_class
# db/migrate/20230706115924_weaviate_create_fashion_product_class.rb

class WeaviateCreateFashionProductClass < ActiveRecord::Migration[7.0]
  def up
    class_name = 'FashionProduct'        # Name of the class (in vector DB)
    weaviate_client = WeaviateClient.create_client
    begin
      if weaviate_client.schema.get(class_name: class_name) != "Not Found"
        puts "Class '#{class_name}' already exists"
        return
      end

      weaviate_client.schema.create(
          class_name: class_name,
          vectorizer: 'img2vec-neural',   # Module used to vectorize the images
          module_config: {
            'img2vec-neural': {           # Weaviate's img2vec module
              'imageFields': [
                'image'
              ]
            }
          },
          properties: [                   # Properties of the VDB class
            {
              'name': 'image',
              'dataType': ['blob']
            },
            {
              'name': 'fashion_prd_id',
              'dataType': ['int']
            }
        ]
      )
    rescue => exception
      if weaviate_client.schema.get(class_name: class_name) != "Not Found"
        weaviate_client.schema.delete(class_name: class_name)
      end
      raise exception
    end
  end

  def down
    weaviate_client = WeaviateClient.create_client
    if weaviate_client.schema.get(class_name: 'FashionProduct') != "Not Found"
      weaviate_client.schema.delete(class_name: 'FashionProduct')
    end
  end
end

Weaviate’s Schema contains the structure of the classes (similar to db tables). Each class contains properties (similar to table columns), we are using two properties image with blob datatype (to store the image vectors) and fashion_prd_id with integer datatype. If we want to delete a class under the schema along with all the data under the class we can use the schema – delete API as used in the down method of the migration.

We will import the images to Weaviate DB. Using the Weaviate objects batch create API we can import the image objects to Weaviate DB, the vector conversion of the images before storing will be handled by the Weaviate which uses the resnet-50 image APIs.

Create a one-time rake to import the FashionProduct data into Weaviate DB.

task import_fashion_prd_data_to_weaviate: :environment do
  weaviate_client = WeaviateClient.create_client

  FashionProduct.find_in_batches(batch_size: 500) do |fpds|
    # Generate array with FashionProduct Base64 encoded image and ID values
    objects_to_upload = fpds.map do |fpd|
      {
        class: 'FashionProduct',
        properties: {
          image: Base64.strict_encode64(fpd.product_image_attachment.download).to_s,
          fashion_prd_id: fpd.id
        }
      }
    end

    # Weaviate.io objects API batch import
    p weaviate_client.objects.batch_create(
      objects: objects_to_upload
    )
  end

  puts "-- Total FashionProduct records: #{FashionProduct.count}"

  uploaded_count = weaviate_client.query.aggs(
    class_name: 'FashionProduct', 
    fields: 'meta { count }',
  )

  puts "-- Total objects uploaded to Weaviate DB: #{uploaded_count}"
end

We are heading towards the final stage, searching through the vector image data with an image input. Using the Weaviate client’s query API we can get the matching image records, for which we will be passing a Base64 encoded image value for the near_image attribute.

require 'open-uri'
weaviate_client_inst = WeaviateClient.create_client

test_img_url = '<URL of the image>'
base_64_img_string = Base64.strict_encode64(URI.parse(test_img_url).read)

result = weaviate_client_inst.query.get(
class_name: 'FashionProduct',
limit: '5',
offset: '1',
near_image: "{ image: \"#{base_64_img_string}\" }",
fields: 'fashion_prd_id'
)

## result
[{"fashion_prd_id"=>10655}, {"fashion_prd_id"=>3077}, {"fashion_prd_id"=>207}, {"fashion_prd_id"=>11834}, {"fashion_prd_id"=>7883}]


## FashionProducts stored in application database
FashionProduct.where(id: result.map { |prd_val| prd_val['fashion_prd_id'] })

Finally, glue up the core VDB query logic with the FashionProduct model and the controller, we will build a simple UI that renders a form to get the image file from the user, and based on the image search we will display all the matching FashionProduct items. Refer to the commits (FashionProducts UI dev and Image search controller with UI)

The world of vector databases is vast and promising. Unlock new horizons in your company by leveraging this technology for enhanced customer experiences, content curation, and targeted advertising.

A zealous programmer’s fervor fuels a perpetual quest for knowledge and mastery in the realm of coding

vector database meme

References

Code for the project – Fashion Products Image Search (ROR) / iNBAkRISH

https://medium.com/vector-database/what-are-vector-databases-8100178c5774

https://medium.com/@rushing_andrei/adding-intelligent-search-to-a-rails-application-56d3ed794b8a


Fuel Your Engineering Curiosity: Visit https://engineering.rently.com/

Get to know about Rently at https://use.rently.com/

Leave a Reply

Login with