Vibe coding is a new term that has entered our lives with AI coding tools like Cursor. It means coding by only prompting. We made several benchmarks to test the vibe coding tools, and with our experience, we decided to prepare this detailed guide.
There are many different AI code editors with different features. The most preferred ones are:
- Cursor: Cursor’s Composer feature is an agentic tool, in other words, it can create, edit, and delete files in your codebase.
- Windsurf: Windsurf’s Cascade works similarly to Cursor Composer, users can prompt what changes they want and let the agent build it.
- Replit: Replit works on the browser, which some users prefer. It can also be used as a mobile app.
- Cline: Cline is the only open-source tool in this list.
- Claude Code: Claude Code is an experimental tool made by Anthropic. Currently, you can enroll via the waitlist.
- Aider
- Lovable.dev
- Bolt
- v0 by Vercel
These tools have similar features. They use AI models to generate code, modify existing code, and explore code using the prompt given by the user. They can even run terminal commands and solve errors by using error messages.
Some of them also adopt MCP features.
Cursor went 1M to 100M ARR in 2 years, with a fast rise, showing the importance of the topic and the popularity of the tools.
How it works?
These tools are powered by AI, so they either have their own LLM or offer some LLM integrations like Claude Sonnet 3.5, and gpt-o1.
While Claude Sonnet 3.5 is the absolute favorite of the users, some prefer to follow a different path. They reported that using DeepSeek R1 in the planning phase and coding the project Claude Sonnet 3.5 positively impacted the project.
While being a brand new model, Claude Sonnet 3.7 has also been tested by users. Some report that it is better than all the other models, especially in the front end.
Others report that it is too “self-confident” and adds many unnecessary and unwanted features to the project. They also report that it is not successful at following the prompted rules.
What are the best practices of vibe coding?
Planning is the key; every feature must be planned in every detail.
Having it written on .cursorrules or in a file if you are using other tools instead of Cursor helps the AI tool stay aligned.
Also, users mentioned that making AI write every applied feature in a separate file helps it follow the guidelines more strictly.
Their hallucination is a huge problem, especially for large codebases, so users must know that.
Do not forget to use a code review tool before publishing the project to ensure safety.
How will it affect the future of software engineers?
This is a controversial topic:
Optimists claim that these tools help develop software faster and easier. By using these tools, one month’s worth of work can be done in one day. These tools also allow non-developers to build software without the coding skills needed.
Pessimists, on the other hand, say that these tools are killing developers’ coding skills. A junior developer with Cursor is not learning any new skills, and this is a problem for the future. Also, AI handling every task is a huge threat to software development—with its current definition-.
It may also lead to some security issues; therefore, the high-security sectors will not adopt AI-generated code for a while.
As Karpathy said, now most people are just “See stuff, say stuff, run stuff, and copy paste stuff”. This will make ideas more important than coding skills in software engineering.
A realistic point of view
For a software project, usually, some developers and designers are needed. With these tools, a technical but non-developer user can code their own project, and earn money from it.
The definition of software development will likely change in the following years, one with strong skills and creativity will survive, and most of today’s work (especially in the web and app development area) will be replaced by AI.
Please note that we didn’t get any full software in those benchmarks, but it does not mean that the tools are not capable of it. To keep the benchmarks as objective as possible, we did not make further prompting the fix the issues in the codebases.
You can read them in more detail by following the links:
Cursor vs. Windsurf vs. Replit:
We made 2 tasks with Cursor, Windsurf, Replit, Claude Code and Cline.
- Prompt-to-API benchmark: Windsurf is the leader of this benchmark. Replit was N/A in this task since it was not able to use Heroku for deployment.
- App building benchmark: Claude Code is the leader of this benchmark, with a 93% success rate.
Screenshot-to-Code:
We tested v0, Bolt, and Lovable by using 5 Figma design screenshots, and asked them to code these. v0 and Bolt are the most successful tools, with more than 70% success rates.
AI Website Creator:
We prompted v0, Bolt, Lovable, and CerebrasCoder to create a website, the leader of the benchmark is v0 with a 90% success rate.
AI Coding Benchmark:
We tested the AI coding assistants across 5 different criteria. Benchmarked tools are Cursor, Amazon Q, Gitlab, Replit, Cody, Gemini, Codeium, Codiumate, Github Copilot, and Tabnine. The overall leader of this benchmark is Cursor.
LLM Coding Benchmark – LMC Eval:
We benchmarked leading LLMs on 100 different logic/math coding questions, OpenAI’s o1 and o3-mini are the leaders of this benchmark.
Is AI-generated code safe to use?
AI coding assistants usually generate safe code, but users must be aware that they can hallucinate or leave backdoors in the system. Therefore, the generated code should always be checked by a human expert. It seems so easy to throw away weekend projects with AI-assisted development to write code, but scaling it and making it safe for the customers still requires an experienced developer. Therefore, users should not see it as “copy-paste stuff” but be aware of the workflow.
External Links
Source link
#Approach #Coding