graph LR
segment -- create_query --> query
query -- retrieve --> segment
Self-supervised evaluation
Retrieval evaluation with synthetic queries
Even if you lack labeled data, it’s still often possible to do meaningful evaluations with synthetic data.
For example, with a segmented text dataset you can:
loop:
choose arbitrary segment
query = create_query(segment)
save (query, segment) to dataset
Synthetic data and property-based testing
When synthetic data is used for training or fine-tuning, it makes sense to think of it as data. But in the context of evaluation, you can also think of this process as a kind of stochastic property-based testing, where we verify that a circuit we think ought to exist based on our understanding of the problem is in fact closed.
It turns out there’s a whole lore around property-based testing, often exploiting formal properties of domain objects and operations on them. E.g. (following Wlaschin 2014):
- commutative relationships (and model-based approaches),
- invertible operations,
- invariance under transformation,
- idempotence, and
- structural induction.
The example given above fits the ‘invertible operations’ paradigm, where the operations are ‘given a query, retrieve a responsive segment’ and ‘given a segment, generate a plausible query’. But really, for lots of properties involving a generation step you could collect a dataset of generated examples and call it synthetic evaluation data.
References:
- Claessen, K. and Hughes, J. (2000). QuickCheck: a lightweight tool for random testing of Haskell programs. ACM SIGPLAN Notices, Vol. 35, Issue 9, pp. 268-279.
- Es, S. (2024). All about synthetic data generation. Ragas blog.
- Esfandiarpoor, R. et at. (2025). Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance. arXiv:2503.23239 [cs.IR].
- Rahmani, H. (2024). Synthetic Test Collections for Retrieval Evaluation. arXiv:2405.07767 [cs.IR].
- Wlaschin, S. (2014). Choosing properties for property-based testing. F# Blog.