Similarity_evaluation#

similarity_evaluation.distance#

class gptcache.similarity_evaluation.distance.SearchDistanceEvaluation(max_distance=4.0, positive=False)[source]#

Using search distance to evaluate sentences pair similarity.

This is the evaluator to compare two embeddings according to their distance computed in embedding retrieval stage. In the retrieval stage, search_result is the distance used for approximate nearest neighbor search and have been put into cache_dict. max_distance is used to bound this distance to make it between [0-max_distance]. positive is used to indicate this distance is directly proportional to the similarity of two entites. If positive is set False, max_distance will be used to substract this distance to get the final score.

Parameters
  • max_distance (float) – the bound of maximum distance.

  • positive (bool) – if the larger distance indicates more similar of two entities, It is True. Otherwise it is False.

Example

from gptcache.similarity_evaluation import SearchDistanceEvaluation

evaluation = SearchDistanceEvaluation()
score = evaluation.evaluation(
    {},
    {
        "search_result": (1, None)
    }
)
evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float[source]#

Evaluate the similarity score of pair. :param src_dict: the query dictionary to evaluate with cache. :type src_dict: Dict :param cache_dict: the cache dictionary. :type cache_dict: Dict

Returns

evaluation score.

range() Tuple[float, float][source]#

Range of similarity score.

Returns

minimum and maximum of similarity score.

similarity_evaluation.exact_match#

class gptcache.similarity_evaluation.exact_match.ExactMatchEvaluation[source]#

Using exact metric to evaluate sentences pair similarity.

This evaluator is used to directly compare two question from text. If every single character in two questions can match, then this evaluator will return 1 else 0.

Example

from gptcache.similarity_evaluation import ExactMatchEvaluation

evaluation = ExactMatchEvaluation()
score = evaluation.evaluation(
    {
        "question": "What is the color of sky?"
    },
    {
        "question": "What is the color of sky?"
    }
)
evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float[source]#

Evaluate the similarity score of pair.

Parameters
  • src_dict (Dict) – the query dictionary to evaluate with cache_dict.

  • cache_dict (Dict) – the cache dictionary.

Returns

evaluation score.

range() Tuple[float, float][source]#

Range of similarity score.

Returns

minimum and maximum of similarity score.

similarity_evaluation.kreciprocal#

class gptcache.similarity_evaluation.kreciprocal.KReciprocalEvaluation(vectordb: gptcache.manager.vector_data.base.VectorBase, top_k: int = 3, max_distance: float = 4.0, positive: bool = False)[source]#

Using K Reciprocal to evaluate sentences pair similarity.

This evaluator borrows popular reranking method K-reprocical reranking for similarity evaluation. K-reciprocal relation refers to the mutual nearest neighbor relationship between two embeddings, where each embedding is the K nearest neighbor of the other based on a given distance metric. This evaluator checks whether the query embedding is in candidate cache embedding’s top_k nearest neighbors. If query embedding is not candidate’s top_k neighbors, the pair will be considered as dissimilar pair. Otherwise, their distance will be kept and continue for a SearchDistanceEvaluation check. max_distance is used to bound this distance to make it between [0-max_distance]. positive is used to indicate this distance is directly proportional to the similarity of two entites. If positive is set False, max_distance will be used to substract this distance to get the final score.

Parameters
  • vectordb (gptcache.manager.vector_data.base.VectorBase) – vector database to retrieval embeddings to test k-reciprocal relationship.

  • top_k (int) – for each retievaled candidates, this method need to test if the query is top-k of candidate.

  • max_distance (float) – the bound of maximum distance.

  • positive (bool) – if the larger distance indicates more similar of two entities, It is True. Otherwise it is False.

Example

from gptcache.similarity_evaluation import KReciprocalEvaluation
from gptcache.manager.vector_data.faiss import Faiss
from gptcache.manager.vector_data.base import VectorData
import numpy as np

faiss = Faiss('./none', 3, 10)
cached_data = np.array([0.57735027, 0.57735027, 0.57735027])
faiss.mul_add([VectorData(id=0, data=cached_data)])
evaluation = KReciprocalEvaluation(vectordb=faiss, top_k=2, max_distance = 4.0, positive=False)
query = np.array([0.61396013, 0.55814557, 0.55814557])
score = evaluation.evaluation(
    {
        'question': 'question1',
        'embedding': query
    },
    {
        'question': 'question2',
        'embedding': cached_data
    }
)
evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float[source]#

Evaluate the similarity score of pair.

Parameters
  • src_dict (Dict) – the query dictionary to evaluate with cache.

  • cache_dict (Dict) – the cache dictionary.

Returns

evaluation score.

static normalize(vec: numpy.ndarray)[source]#

Normalize the input vector.

Parameters

vec (numpy.array) – numpy vector needs to normalize.

Returns

normalized vector.

similarity_evaluation.np#

class gptcache.similarity_evaluation.np.NumpyNormEvaluation(enable_normal: bool = True)[source]#

Using Numpy norm to evaluate sentences pair similarity.

This evaluator calculate the L2 distance of two embeddings for similarity check. if enable_normal is True, both query embedding and cache embedding will be normalized.

Parameters

enable_normal (bool) – whether to normalize the embedding, defaults to False.

Example

from gptcache.similarity_evaluation import NumpyNormEvaluation
import numpy as np

evaluation = NumpyNormEvaluation()
score = evaluation.evaluation(
    {
        'question': 'What is color of sky?'
        'embedding': np.array([-0.5, -0.5])
    },
    {
        'question': 'What is the color of sky?'
        'embedding': np.array([-0.49, -0.51])
    }
)
evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float[source]#

Evaluate the similarity score of pair.

Parameters
  • src_dict (Dict) – the query dictionary to evaluate with cache.

  • cache_dict (Dict) – the cache dictionary.

Returns

evaluation score.

static normalize(vec: numpy.ndarray)[source]#

Normalize the input vector.

Parameters

vec (numpy.array) – numpy vector needs to normalize.

Returns

normalized vector.

range() Tuple[float, float][source]#

Range of similarity score.

Returns

minimum and maximum of similarity score.

similarity_evaluation.onnx#

class gptcache.similarity_evaluation.onnx.OnnxModelEvaluation(model: str = 'GPTCache/albert-duplicate-onnx')[source]#

Using ONNX model to evaluate sentences pair similarity.

This evaluator use the ONNX model to evaluate the similarity of two sentences.

Parameters

model (str) – model name of OnnxModelEvaluation. Default is ‘GPTCache/albert-duplicate-onnx’.

Example

from gptcache.similarity_evaluation import OnnxModelEvaluation

evaluation = OnnxModelEvaluation()
score = evaluation.evaluation(
    {
        'question': 'What is the color of sky?'
    },
    {
        'question': 'hello'
    }
)
evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float[source]#

Evaluate the similarity score of pair.

Parameters
  • src_dict (Dict) – the query dictionary to evaluate with cache.

  • cache_dict (Dict) – the cache dictionary.

Returns

evaluation score.

inference(reference: str, candidates: List[str]) numpy.ndarray[source]#

Inference the ONNX model.

Parameters
  • reference (str) – reference sentence.

  • candidates (List[str]) – candidate sentences.

Returns

probability score indcates how much is reference similar to candidates.

range() Tuple[float, float][source]#

Range of similarity score.

Returns

minimum and maximum of similarity score.

similarity_evaluation.similarity_evaluation#