Similarity_evaluation#
Index
similarity_evaluation.distance#
- class gptcache.similarity_evaluation.distance.SearchDistanceEvaluation(max_distance=4.0, positive=False)[source]#
Using search distance to evaluate sentences pair similarity.
This is the evaluator to compare two embeddings according to their distance computed in embedding retrieval stage. In the retrieval stage, search_result is the distance used for approximate nearest neighbor search and have been put into cache_dict. max_distance is used to bound this distance to make it between [0-max_distance]. positive is used to indicate this distance is directly proportional to the similarity of two entites. If positive is set False, max_distance will be used to substract this distance to get the final score.
- Parameters
max_distance (float) – the bound of maximum distance.
positive (bool) – if the larger distance indicates more similar of two entities, It is True. Otherwise it is False.
Example
from gptcache.similarity_evaluation import SearchDistanceEvaluation evaluation = SearchDistanceEvaluation() score = evaluation.evaluation( {}, { "search_result": (1, None) } )
similarity_evaluation.exact_match#
- class gptcache.similarity_evaluation.exact_match.ExactMatchEvaluation[source]#
Using exact metric to evaluate sentences pair similarity.
This evaluator is used to directly compare two question from text. If every single character in two questions can match, then this evaluator will return 1 else 0.
Example
from gptcache.similarity_evaluation import ExactMatchEvaluation evaluation = ExactMatchEvaluation() score = evaluation.evaluation( { "question": "What is the color of sky?" }, { "question": "What is the color of sky?" } )
similarity_evaluation.kreciprocal#
- class gptcache.similarity_evaluation.kreciprocal.KReciprocalEvaluation(vectordb: gptcache.manager.vector_data.base.VectorBase, top_k: int = 3, max_distance: float = 4.0, positive: bool = False)[source]#
Using K Reciprocal to evaluate sentences pair similarity.
This evaluator borrows popular reranking method K-reprocical reranking for similarity evaluation. K-reciprocal relation refers to the mutual nearest neighbor relationship between two embeddings, where each embedding is the K nearest neighbor of the other based on a given distance metric. This evaluator checks whether the query embedding is in candidate cache embedding’s top_k nearest neighbors. If query embedding is not candidate’s top_k neighbors, the pair will be considered as dissimilar pair. Otherwise, their distance will be kept and continue for a SearchDistanceEvaluation check. max_distance is used to bound this distance to make it between [0-max_distance]. positive is used to indicate this distance is directly proportional to the similarity of two entites. If positive is set False, max_distance will be used to substract this distance to get the final score.
- Parameters
vectordb (gptcache.manager.vector_data.base.VectorBase) – vector database to retrieval embeddings to test k-reciprocal relationship.
top_k (int) – for each retievaled candidates, this method need to test if the query is top-k of candidate.
max_distance (float) – the bound of maximum distance.
positive (bool) – if the larger distance indicates more similar of two entities, It is True. Otherwise it is False.
Example
from gptcache.similarity_evaluation import KReciprocalEvaluation from gptcache.manager.vector_data.faiss import Faiss from gptcache.manager.vector_data.base import VectorData import numpy as np faiss = Faiss('./none', 3, 10) cached_data = np.array([0.57735027, 0.57735027, 0.57735027]) faiss.mul_add([VectorData(id=0, data=cached_data)]) evaluation = KReciprocalEvaluation(vectordb=faiss, top_k=2, max_distance = 4.0, positive=False) query = np.array([0.61396013, 0.55814557, 0.55814557]) score = evaluation.evaluation( { 'question': 'question1', 'embedding': query }, { 'question': 'question2', 'embedding': cached_data } )
similarity_evaluation.np#
- class gptcache.similarity_evaluation.np.NumpyNormEvaluation(enable_normal: bool = True)[source]#
Using Numpy norm to evaluate sentences pair similarity.
This evaluator calculate the L2 distance of two embeddings for similarity check. if enable_normal is True, both query embedding and cache embedding will be normalized.
- Parameters
enable_normal (bool) – whether to normalize the embedding, defaults to False.
Example
from gptcache.similarity_evaluation import NumpyNormEvaluation import numpy as np evaluation = NumpyNormEvaluation() score = evaluation.evaluation( { 'question': 'What is color of sky?' 'embedding': np.array([-0.5, -0.5]) }, { 'question': 'What is the color of sky?' 'embedding': np.array([-0.49, -0.51]) } )
- evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float [source]#
Evaluate the similarity score of pair.
- Parameters
src_dict (Dict) – the query dictionary to evaluate with cache.
cache_dict (Dict) – the cache dictionary.
- Returns
evaluation score.
similarity_evaluation.onnx#
- class gptcache.similarity_evaluation.onnx.OnnxModelEvaluation(model: str = 'GPTCache/albert-duplicate-onnx')[source]#
Using ONNX model to evaluate sentences pair similarity.
This evaluator use the ONNX model to evaluate the similarity of two sentences.
- Parameters
model (str) – model name of OnnxModelEvaluation. Default is ‘GPTCache/albert-duplicate-onnx’.
Example
from gptcache.similarity_evaluation import OnnxModelEvaluation evaluation = OnnxModelEvaluation() score = evaluation.evaluation( { 'question': 'What is the color of sky?' }, { 'question': 'hello' } )
- evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float [source]#
Evaluate the similarity score of pair.
- Parameters
src_dict (Dict) – the query dictionary to evaluate with cache.
cache_dict (Dict) – the cache dictionary.
- Returns
evaluation score.