Invidious

In the middle of stress testing o3-mini and DeepSeek-r1.

o3-mini is winner - it's not even a competition!

I'm testing each model against the most common developer tasks:
1️⃣ Build a new project from scratch
2️⃣ Build a new feature in an existing app
3️⃣ Refactor existing code and generate tests

-

With o3-mini and cursor, I was able to build a ChatGPT replica that allows me to chat with local LLM models in a single shot and everything works like a charm!

On the other hand, DeepSeek-r1 got stuck and only generated a single javascript file. Far from a fully functional website.

Here's the GitHub repo with the prompts so you can replicate the results on your own:

github.com/bhancockio/o3-mini-vs-deepseek-r1

The screenshot below shows the functional app that o3-mini was able to create in a single shot!

I'm still working on the second 2 tasks but right now, there is a clear winner.

I'll keep you posted as I keep testing.

Also, I'll be putting all of this into a YouTube video that will hopefully come out on Monday!

If you have any questions around o3-mini or deepseek-r1, let me know!

I feel like I have a PHD on these models after all the testing I've done over the past 24 hours.