The GenAI Testing solution for real people

LLM app testing & analytics platform. Designed to be loved by all teams. Powerful enough for any application. Simple enough for any user.

It's tough to build high-performing GenAI apps you can trust

Teams currently spend huge amounts of time iterating across prompts, models & architectures. And the only way they can evaluate performance is with manual & subjective ‘testing by vibes’.

Composo makes GenAI testing easy

With Composo, teams can rapidly achieve high performance, guarantee accuracy & minimise the cost of their GenAI applications.

build around...

Build around your workflow

No additional steps, complexity, or changes to how you already work

Testing in progress...

Always see the big picture

Full end-end testing of your entire workflow.

No steep learning curve

Link to your application's codebase and start getting insights straight away

learning curve

Everything in one place

Effortlessly evaluate and optimize GenAI applications in a simple yet powerful platform, featuring a playground and a powerful testing & evaluation suite.


Get started straight away with our LLM model playground & AI prompt writer. It works immediately out the box, no set up required.

Testing & Evaluation Suite

Conduct powerful, automated testing & iteration of a linked application. Just a few lines of code to set up, then use Composo with no technical ability required.
Book a Demo
Easy integration with Composo

Built for performance, designed for everyone

Composo is the most powerful GenAI tool you don't have to be a developer to use.

AI prompt writer

Just type in a prompt, choose target model to optimise for & press play. Works with GPT4, Claude, DALL-E, Midjourney and more.

LLM model playground

Chat with your own app, or directly with all major open & closed source LLMs (e.g. OpenAI, Anthropic, Gemini & LLAMA) in a simple to use playground.

Rigorous automated evaluation

Evaluate performance across a range of metrics e.g. ground truth pairs, vector similarity, validity of code & the research-backed Composo AI critic

Preset & custom tests

Use Composo’s built in tests for harmful output, resistance to prompt injection & more. Or build your own.

No limit to app complexity

Our codebase integration means you can test agents, RAG or anything else, fully end to end. No need to be constrained by low code app builders.

Rapid set-up

AI prompt writing & the LLM playground work immediately. To link your own app takes just 5 minutes & a few lines of code.

A smooth, yet powerful workflow

all your apps

Bring your whole team

Composo is human friendly, and tailored for collaboration across your teams. Everyone can collaborate using a simple UI. Designed to be loved by developers & non-technical teams alike.

Dan joined your team 🎉

Bring your team to Composo

Now your whole team can participate


Our Team


Sebastian Fox


Ex-McKinsey & QuantumBlack
Oxford University


Luke Markham


Ex-Graphcore ML Engineer
Oxford University


Armin Sommer

Founding Engineer

Software founder at 18
Computer Science ETH Zurich

Ready to try Composo?

No additional steps, complexity, or changes to how you already work. Start using Composo today and build the future of AI.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.


We’re here to answer your questions and make your life simpler.

How does Composo work?

Composo simplifies & automates the process of testing and optimizing LLM-based applications to achieve high quality, accuracy, and safety.

Composo links to an application with just a few lines of code, and then enables you to access & iterate on all the core components of a GenAI system (e.g. prompts, models, and RAG settings).

You can use Composo to do both quick, ad-hoc iteration, as well as to conduct rigorous, systematic testing of an application to evaluate how different configurations of an application impact the quality of generated outputs.

One of the key benefits of Composo, is that it requires no code to use once set up. This means that anyone can run tests and optimize an application while an engineer simultaneously continues to build out the codebase as normal. This is particularly crucial in instances where someone other than the developer needs to be able to test, evaluate & provide feedback on an application (e.g. legal, HR, medical, customer service applications).

How do you determine accuracy & detect hallucinations?

To determine accuracy and detect hallucinations, we employ a range of methods including:

1. Comparison to ground-truth or gold standard answers: We evaluate the model's output against verified, correct responses to assess its accuracy.

2. RAG metrics: Retrieval augmented generation can be quantified with a range of measures such as faithfulness to source material (i.e. number of claims in answer that are supported by underlying source material), relevance to the query and precision & recall of the context provided.

3. AI critic: We use other language models to assess the quality and accuracy of the generated output. This is powered by the Composo AI critic which is built to identify hallucinations or inconsistencies.

How secure is my data?

At Composo, data security is our top priority. We serve many enterprise customers in regulated industries and employ robust measures to safeguard your information.

Your application runs entirely on your own server and Composo only has access to inputs and outputs, never your underlying data or systems.

For highly sensitive data, we recommend anonymization. However, we also offer end-to-end encryption and dedicated instances where absolutely necessary.

Can Composo deploy on my company's infra?

Yes. Reach out to us to discuss further!

How easy is it to use?

Composo is designed to be really easy to use.

The initial setup for an engineer is very quick, requiring only a few lines of code.

After this set up, using Composo is intuitive & requires no-code. This makes it perfect for anyone whether technical or not (e.g. a business user, domain expert or product manager).

Does Composo work for complex applications such as agentic systems?

Yes, Composo is designed to be completely flexible to handle any complexity of application, whether it has agents, retrieval-augmented generation (RAG), tool use, or anything else.

It provides the ability to test and optimize both full end-to-end performance, as well as to isolate and evaluate the performance of individual components or agents within a GenAI system.

How much does it cost?

Get in touch for a demo & we can talk through how our pricing works!

Still have questions?

Let’s get in touch, we’d love to learn about what you’re building!

Contact Us

Start using Composo today

With rapid codebase integration and flexible pricing, we make
it easy to take your GenAI apps testing to the next level.