DeepSeek V4 vs Claude Opus 4.7 vs GPT-5.5: I Tested All Three on Real Business Tasks

Three frontier AI models all launched within a week of each other. I decided to spend two days putting them through the kind of tasks small business owners actually run.

By Sam Frost

Published Apr 27, 2026 · 5 min read

Image containing the logos of DeepSeek, Claude and ChatGPT

In this article

Last week saw three of the biggest AI model launches of 2026. Anthropic shipped Claude Opus 4.7. OpenAI followed with GPT-5.5. Then DeepSeek dropped V2, an open-source model from China that costs roughly a tenth of what its American rivals charge.

If you run a business that uses AI for anything important: writing, coding, customer support, research, document analysis, you’ve got three new options to consider. Right now, most of the comparisons out there are benchmarks. Numbers that might look impressive on paper but mean nothing when it comes to helping you execute.

So I decided to run my own tests. Seven tasks pulled from the kind of work I do every day across my business. The same prompts for each.

Before I get into the business tests though, I wanted to start with something even simpler.

Test zero: a setup test, not a serious one

Before running the actual business tests, I wanted a quick gut-check. This is more of a warm up to set the scene before things get more rigorous. The prompt:

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

DeepSeek V4-Pro

This is a delightful little paradox. Let's think it through:

If you drive: Your car arrives dirty, travels a comically short 50 meters, and then gets washed. You'll feel slightly ridiculous idling up to a destination you could have reached on foot in about 30 seconds. But the car does get washed. Mission accomplished.

If you walk: You arrive at the car wash with no car. You stand there, perhaps pantomiming a sponge on thin air, while the attendants question your life choices. Your car remains dirty at home.

The real answer: You have to drive. The car wash needs the car. Driving 50 meters to wash your car is the automotive equivalent of putting on socks just to walk to the bathroom — absurd, but logistically unavoidable.

Though if you're feeling ambitious, you could push the car there. That way you get your steps in and the car still gets clean. Win-win. 🚗💨

Claude Opus 4.7

Walk, obviously. 😄

Driving 50 meters to a car wash is a bit like taking an elevator down one floor to go to the gym. A few quick reasons:

It's 50 meters. That's roughly a 30-40 second walk. Driving, accounting for starting the car, pulling out, and parking, probably takes about the same time or longer.
Cold starts are hard on engines. Very short trips don't let the engine reach optimal operating temperature, which is actually worse for wear and fuel efficiency than a longer drive.
You'd just be dirtying the car again. Even a short drive kicks up dust and road grime — kind of defeats the purpose of washing it.

Wait... you want to wash your car, but you'd drive it to the car wash? 🤔 I think you've got this backwards — the car needs to go to the car wash, not you. Unless you're planning to wash it by hand there, in which case: walk, bring a podcast, enjoy the fresh air.

GPT-5.5

Drive.

Even though the car wash is only 50 meters away, your car needs to be there to get washed. You can drive it over, wash it, and then drive back.

This is a question with one right answers: you drive. The car needs a wash. The results are an interesting illustration of each model’s personality. DeepSeek V4 Pro got it right and seemed to show a lighter side with some humor. GPT-5.5 also got it right and got there the fastest. Claude Opus 4.7 got it wrong and confidently.

One prompt, not a serious test. But important because it tracks with something I’ve noticed using Claude Opus 4.7: it’s more willing to push back and disagree with the user than previous versions, even though it can be wrong. DeepSeek leaned into personality, something it lacked in V3.2. GPT-5.5 was efficient to the point of bluntness, it has been accused of rambling on in previous versions.

With that out of the way, here’s the actual tests.

Test 1: handling a tricky customer support email

The first real test is something every business owner has to deal with: a customer wants something they’re not entitled to, and your job is to say no without causing even more problems. Here’s the prompt:

A customer has emailed us asking for a full refund on a $299 software subscription. They've used it for 3 weeks. Our refund policy is 14 days. They're claiming the software is 'much harder to use than expected' but our records show they've completed 4 of the 7 onboarding steps and used the product 12 times. Write a reply that declines the refund but offers a free 30-minute onboarding call instead. Tone should be empathetic but firm. Don't mention the policy by name like a robot, just explain naturally.

This is both a test of judgment and writing. The model has to hold a position the customer won’t like, do it without sounding like a robot, and pivot to something that will solve the customer’s complaint.

DeepSeek V4-Pro

Here's a draft you can send:

Subject: Let's get you comfortable with [Product Name]

Hi [Customer Name],

Thanks for reaching out — and for giving [Product Name] a real try over the past few weeks. I completely understand the frustration when something takes longer to click than expected, and I'm sorry that's been your experience.

I've taken a look at your account to better understand what's happening. I can see you've already knocked out 4 of the 7 onboarding steps and have logged in a dozen times, which tells me you've been putting in genuine effort. That's actually really encouraging — you're further along than you might think, and a lot of the heavy lifting is already done.

Because we're now past the initial two-week window that gives everyone a risk-free chance to decide if the product is a fit, I'm not able to process a refund on this one. That window exists specifically so folks can walk away if it's not working, and we do stand firmly by that. But once we're beyond it — and especially with the level of usage I'm seeing — the subscription is considered committed.

What I'd love to do instead is book you a free 30-minute, one-on-one onboarding call with our team. We'll walk through the remaining steps together, clear up whatever has felt confusing, and make sure you leave the call feeling confident about using the software day to day. Most customers who do this tell us it completely turns things around.

Would you be open to that? Just let me know a few times that work for you this week and I'll get it scheduled.

Kind regards,
[Your Name]
[Title]
[Company]

Claude Opus 4.7

Subject: Re: Refund request

Hi [Customer name],

Thanks for reaching out, and I'm sorry to hear the software hasn't clicked for you the way you'd hoped. That's genuinely frustrating, especially when you're trying to get up and running on something new.

I took a look at your account before writing back. I can see you've worked through about half of the onboarding steps and have logged in a dozen or so times over the past few weeks, so you've clearly put real effort into giving it a fair shot. Because of that activity and the amount of time that's passed since sign-up, I'm not able to process a refund on the subscription — I know that's not the answer you were hoping for, and I'm sorry to be the one to share it.

What I'd like to do instead is actually solve the problem you're running into. In my experience, the pieces people find clunky at first are almost always the ones that aren't covered well in the standard onboarding flow — and once someone walks you through them live, the whole thing tends to feel very different.

So here's what I'd like to offer: a free 30-minute onboarding call, one-on-one, with someone from our team. We'll pick up from where you are, focus on the specific workflows that matter to you, and iron out whatever's been getting in the way. Most people come out of these calls using the product in a completely different (and much faster) way than before.

If you're open to it, just reply with a few times that work for you over the next week or two and I'll get it on the calendar. And if there are particular tasks or sticking points you'd like us to prioritize on the call, send those over too — it helps us make the time count.

Either way, I appreciate you taking the time to write in. I'd really like to see you get the value you signed up for.

Best,
[Your name]
[Your title]

A couple of notes on choices I made, in case you want to tweak:

I avoided naming the 14-day window directly, but gestured at it with "the amount of time that's passed since sign-up" so the reasoning still feels transparent rather than arbitrary.
I referenced their usage data without throwing exact numbers at them in a "gotcha" way — the goal is to signal we looked, not we're building a case against you.
The offer is framed as solving their actual complaint ("hard to use"), which makes the decline feel less like a brush-off and more like a redirect to something useful.
If you want it firmer, cut the second paragraph's apology and tighten the opening. If you want it warmer, add a line at the end acknowledging that they can of course cancel renewal whenever they'd like.

GPT-5.5

Subject: We’d like to help you get more value from your subscription

Hi [Customer Name],

Thank you for reaching out, and I’m sorry to hear the software has felt harder to use than expected. We understand how frustrating it can be when a product doesn’t feel intuitive right away.

After reviewing your account, we can see that the subscription has been active for about three weeks and that you’ve already made progress with setup and usage. Because of the time elapsed and the activity on the account, we’re not able to issue a full refund in this case.

That said, we do want to help make the product easier for you to use. We’d be happy to offer you a free 30-minute onboarding call with our team to walk through your setup, answer questions, and help you get comfortable with the remaining steps.

If you’re open to that, please send over a few times that work for you, and we’ll get it scheduled.

Best,
[Your Name]

All three models passed the basic test, but the differences were shaper than I expected. DeepSeek V4-Pro produced the most send-ready email of the three, with a tight structure and nice reframe of the customer’s frustration into something positive. GPT-5.5, despite hitting every required beat, felt like it was written by AI, without any character. I can’t help but wonder if that would further frustrate the customer. Claude Opus 4.7 wrote the longest and perhaps most human of the three and provided its reasoning which is useful if you’re using AI as a collaborator and complete noise if you just wanted the email.

For pure send output: DeepSeek V4-Pro wins. The email is ready to go without editing. For collaboration: Claude Opus 4.7 wins. The email itself is strong and the notes are genuinely useful.

One issue: every model used em-dashes liberally. This matters if you care about your writing not reading like AI wrote it. Most people are fully aware that em-dashes are a sign of AI output. All three would need a quick edit before ending.

Test 2: drafting a contractor agreement clause

Most small business owners draft their own contracts because hiring a lawyer for every $5,000 freelance project doesn’t make financial sense. This test was designed to see how the models handle a real agreement drafting task with competing interests on both sides. Here’s the prompt:

Write me a 'scope of work and revisions' clause for a freelance contract. Context: I'm a marketing agency hiring a freelance designer for a $5,000 project: logo, brand guidelines, and 5 social media templates. I want to limit revisions to 3 rounds total to prevent scope creep, but I want it worded fairly so the freelancer doesn't feel pinned. Plain English, no legalese. Make it the kind of clause both sides would sign without arguing.

DeepSeek V4-Pro

Scope of Work & Revisions

What's Included

This project covers:

One logo (primary mark, delivered in full colour, black, white, and reversed versions)
Brand guidelines (a single PDF covering logo usage, colour palette, typography, and basic dos and don'ts)
5 social media templates (pre-built layouts for Instagram posts or stories, editable in Canva or Adobe Express)

The total project fee is $5,000.

How Revisions Work

You'll present the work in stages — typically a first pass on the logo, then brand guidelines, then templates. At each stage we'll gather feedback and pass it along to you in one consolidated batch.

We get three rounds of revisions on the overall project, not per deliverable. Here's how that plays out:

Round 1 — The big picture. We'll give you broad feedback: what's landing, what isn't, and any directional shifts we need. You go away and rework things.

Round 2 — Fine-tuning. At this point we're narrowing in. Tweaks to colour, typography, layout, spacing — the details. No new concepts or starting over.

Round 3 — Final polish. This is for small, specific adjustments only. Think "make the blue 10% darker" or "swap this photo for that one," not "what if we tried a completely different approach."

We'll do our best to give you clear, actionable feedback each round so nothing is wasted.

What Counts as a Revision (and What Doesn't)

A revision is any change to work already delivered, based on our feedback.

These don't count as a revision round:

You catching and fixing your own errors (typos, wrong hex codes, misaligned elements)
Minor technical adjustments to get files export-ready
You voluntarily offering an alternative take because you're excited about it (we love that, but it's your call)

What Happens If We Go Past Three Rounds

If we ask for a fourth round, no problem — but it'll be billed separately at your hourly or day rate, whichever you prefer. We'll agree on the scope and fee for that extra round in writing before you start it. Neither side is obligated here: we can choose not to request more rounds, and you can say no if your schedule doesn't allow it.

What "Out of Scope" Looks Like

If we come back mid-project and say "actually, can we add an animated version of the logo?" or "let's make it 12 templates instead of 5" — that's not a revision, it's new work. We'll treat it as a separate mini-project with its own timeline and fee, agreed on before you lift a finger. No surprise scope creep from our side, and no surprise invoices from yours.

Feedback Timelines

We'll get our feedback to you within 5 business days of each delivery. If we go quiet for more than 10 business days, you can nudge us, and we'll pick it back up. If a project stalls for more than 30 days on our end, either of us can call it, and you'll be paid for the work completed up to that point — no hard feelings.

Quick note on the spirit of this: the three-round cap isn't about being stingy. It's about both of us knowing when the work is done. You get a project with a clear finish line, we get predictable costs, and nobody ends up in revision purgatory. If we've done our job giving you good feedback and you've done yours interpreting it, three rounds is plenty.

Claude Opus 4.7

Scope of Work and Revisions

1. What's Included

For the total project fee of $5,000, the Designer will deliver the following:

Primary logo, including a main version plus any necessary variations (e.g., horizontal, stacked, icon-only, black-and-white) as agreed during kickoff.
Brand guidelines document covering logo usage, color palette, typography, and basic do's and don'ts.
Five (5) social media templates in editable format, sized and styled for the platforms agreed upon at project kickoff.

Final file formats and delivery specs will be confirmed in writing before the Designer begins production work.

2. Revisions

We both want the final work to be great, so revisions are built into this project. Here's how they'll work:

Three (3) rounds of revisions are included per deliverable category (logo, brand guidelines, and social templates are each their own category).
A "round" means one consolidated set of feedback from the Client, delivered in a single message or document. Piecemeal feedback sent across multiple messages on the same draft still counts as one round, as long as it arrives within 5 business days of the Designer's submission.
Each round should include feedback from all relevant stakeholders on the Client's side. The Client agrees to gather internal input before sending revisions, rather than sending conflicting notes in separate rounds.
The Designer will confirm receipt of each feedback round and share a realistic turnaround estimate within 2 business days.

3. When Revisions Go Beyond Three Rounds

If the work still isn't landing after three rounds, that's usually a signal something bigger is going on — not a reason to end the relationship on bad terms. In that case:

The Designer and Client will jump on a call to talk through what's not working before any extra billing happens.
If additional revision rounds are needed, they'll be billed at the Designer's standard hourly rate of $[RATE]/hour, estimated and approved in writing before the work starts.
No surprise charges — ever. If it's not approved in writing, it's not billable.

4. What Counts as a New Scope Item (Not a Revision)

To keep things fair, the following are considered new work, not revisions, and will be quoted separately:

Requests for a completely new logo direction after a concept has been approved.
Additional social media templates beyond the five included.
New deliverables not listed in Section 1 (e.g., business cards, website assets, pitch decks, animated versions).
Major strategic pivots (e.g., rebranding the company name or target audience mid-project).

Small tweaks, color adjustments, typography refinements, and layout changes within an approved direction are revisions, not new scope.

5. Approvals and Sign-Off

Each deliverable category will be considered final and approved once the Client confirms approval in writing (email is fine).
If the Client doesn't respond to a submitted deliverable within 10 business days, the Designer may follow up and, after a second notice, consider that deliverable approved so the project can move forward.
Once a category is approved, changes to it fall under Section 3 (additional billable work).

A note on how we'll work together: This clause exists to protect both of us — the Client from runaway costs, and the Designer from endless revisions. We're committing to clear feedback, honest conversations, and finishing this project proud of the work.

A few notes on why this works for both sides:

For you (the agency):

The 3-round cap is explicit and per-category, so it can't be gamed.
The "new scope vs. revision" section is where most freelance disputes actually happen — spelling it out up front prevents 90% of the arguments.
The 10-day silent approval clause protects you if an internal client ghosts you mid-project.

For the freelancer:

They're not penalized if your team sends messy, contradictory feedback — the clause asks you to consolidate.
The "jump on a call before billing extra" language signals good faith, not a gotcha.
Their hourly rate for overage is theirs to set, not dictated.

Two things to fill in before sending:

The $[RATE]/hour figure — ask the designer what their standard rate is rather than setting it yourself.
Confirm the social media platforms/sizes at kickoff and attach that as an appendix or email confirmation, so "5 templates" can't later be disputed as "5 templates per platform."

Want me to tighten this further, add a kill fee / deposit structure, or draft a companion clause for payment terms and IP transfer?

GPT-5.5

Scope of Work and Revisions

The Designer will create the following project deliverables for the Agency:

One final logo design package
Brand guidelines covering logo usage, colors, typography, and basic visual direction
Five social media design templates

The project fee of $5,000 includes the work listed above and up to three total rounds of revisions across the project.

A “revision round” means the Agency gathers its feedback and sends it to the Designer in one clear batch, and the Designer makes updates based on that feedback. Both sides agree to keep feedback focused on improving the agreed deliverables, rather than adding new deliverables or changing the overall project direction after it has been approved.

If the Agency requests work beyond the scope listed above, such as additional logo concepts, extra templates, major new design directions, expanded brand materials, or more than three revision rounds, the Designer will flag it before doing the work. The parties can then agree in writing on any additional fee, timeline changes, or revised scope.

The goal is to give the Agency enough opportunity to provide thoughtful feedback while helping both sides keep the project on track, on budget, and aligned with the original brief.

All three produced something usable, but the gap between them was the largest so far. DeepSeek V4-Pro wrote what feels like a real freelance contract written by someone who has been on both sides previously. It has clear definitions for each revision round, explicitly calls out what doesn’t count as a revision, and then includes a “quick note on the spirit” closing paragraph that humanizes the legal language. GPT-5.5 wrote something that feels generic and perhaps a little too tight. All the elements are there, but it reads like a template file from a random PDF you found online. Claude Opus 4.7 powered in with the most detailed agreement covering edge cases the other two didn’t (silent client approval, what counts as new scope vs revision, revision tracking). The level of detail is impressive but another fine example of Opus overthinking. Claude wrote a clause for a $50,000 engagement; DeepSeek wrote one for the $5,000 brief.

The clear winner this time: DeepSeek V4-Pro. The clause is the right length, right tone and gets the job done.

Sam Frost

In this article

Newsletter

Stay in the loop

Practical guides and case studies on using AI in your business — delivered when it’s ready.

DeepSeek V4 vs Claude Opus 4.7 vs GPT-5.5: I Tested All Three on Real Business Tasks

Test zero: a setup test, not a serious one

Test 1: handling a tricky customer support email

Test 2: drafting a contractor agreement clause

Scope of Work & Revisions

What's Included

How Revisions Work

What Counts as a Revision (and What Doesn't)

What Happens If We Go Past Three Rounds

What "Out of Scope" Looks Like

Feedback Timelines

Scope of Work and Revisions

1. What's Included

2. Revisions

3. When Revisions Go Beyond Three Rounds

4. What Counts as a New Scope Item (Not a Revision)

5. Approvals and Sign-Off

A few notes on why this works for both sides:

Scope of Work and Revisions

Sam Frost

Related

Stay in the loop