Eng leadership at my place are pushing Cursor pretty hard. It's great for banging out small tickets and improving the product incrementally kaizen-style, but it falls down with anything heavy.
I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times. I think may be doing the same to me too.
Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.
As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
I’ve been a paying cursor user for 4-5 months now and feeling the same. A lot more mistakes leaking into my PRs. I feel a lot faster but there’s been a noticeable decrease in the quality of my work.
Obviously I could just better review my own code, but that’s proving easier said than done to the point where I’m considering going back to vanilla Code.
Same result - I tried it for a while out of curiosity but the improvements were a false economy: time saved in one PR is time lost to unplanned work afterwards. And it is hard to spot the mistakes because they can be quite subtle, especially if you've got it generating boilerplate or mocks in your tests.
Makes you look more efficient but it doesn't make you more effective. At best you're just taking extra time to verify the LLM didn't make shit up, often by... well, looking at the docs or the source.. which is what you'd do writing hand-crafted code lol.
I'm switching back to emacs and looking at other ways I can integrate AI capabilities without losing my mental acuity.
Just your run-of-the-mill hallucinations, e.g. mocking something in pytest but only realising afterwards that the mock was hallucinated, the test was based on the mock, and so the real behaviour was never covered.
I mean, I generally avoid using mocks in tests for that exact reason, but if you expect your AI completions to always be wrong you wouldn't use them in the first place.
Beyond that, the tab completion is sometimes too eager and gets in the way of actually editing, and is particularly painful when writing up a README where it will keep suggesting completely irrelevant things. It's not for me.
> the tab completion is sometimes too eager and gets in the way of actually editing
Yea, this is super annoying. The tab button was already overloaded between built-in intellisense stuff and actually wanting to insert tabs/spaces, now there are 3 things competing for it.
I'll often just want to insert a tab, and end up with some random hallucination getting inserted somewhere else in the file.
> Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.
> As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
If it gets too expensive, then I guess the alternative becomes using something like Continue.dev or Cline with one of the providers like Scaleway that you can rent GPUs from or that have managed inference… either that, or having a pair of L4 cards in a closet somewhere (or a fancy Mac, or anything else with a decent amount of memory).
Whereas if there are no well priced options anywhere (e.g. the upfront investment for a company to buy their own GPUs to run with Ollama or something else), then that just means that running LLM based systems nowadays is economically infeasible for many.
Refactoring a critical function that is very large and complex. Yeah, maybe it shouldn't be so large and so complex, but it is and that's the codebase we have. I'm sure many other companies do too.
You're missing the step where I have to articulate (and type) the prompt in natural language well enough for the tool to understand and execute. Then if I fail, I have to write more natural language.
You said just the same in another of your posts:
> if you can begin to describe the function well
So I have to learn how to describe code rather than just writing it as I've done for years?
Like I said, it's good for the small stuff, but bad for the big stuff, for now at least.
> So I have to learn how to describe code rather than just writing it as I've done for years?
If we keep going down this path, we might end up inventing artificial languages for the purpose of precisely and unambiguously describing a program to a computer.
Exactly my point, you don't ask the LLM to give you the whole function. That would be too much English work, because that means you need to write down the contract of the function in concrete and/or list(list of lists).
You ask it to give you one block at a time.
iterate over the above list and remove all strings matching 'apple'
open file and write the above list
etc etc kind of stuff.
Notice how the English here can be interpreted only way, but the LLM is now a good intelligent coding assistant.
>>I think in code, I'd rather just write in code.
Continue to think, just make the LLM type out the outcome of your ideas.
I'm sure you have a great point to make, but you're making it very poorly here.
Experienced developers develop fluency in their tools, such that writing such narrow natural language directives like you suggest is grossly regressive. Often, the tasks don't even go through our head in English like that, they simply flow from the fingers onto the screen in the language of our code, libraries, and architecture. Are you familiar with that experience?
What you suggest is precisely like a fluently bilingual person, who can already speak and think in beautiful, articulate, and idiomatic French, opting to converse to their Parisian friend through an English-French translation app instead of speaking to them directly.
When applied carefully, that technique might help someone who wants to learn French get more and deeper exposure than without a translation app, as they pay earnest attention to the input and output going through their app.
And that technique surely helps someone who never expects to learn French navigate their way through the conversations they need to have in a sufficient way, perhaps even opening new and eventful doors for them.
But it's an absolutely absurd technique for people whose fluency is such that there's no "translating" happening in the first place.
>>Experienced developers develop fluency in their tools
>>You can see that right?
I get it, but this as big a paradigm shift as much as Google and Internet was to people in the 90s. Some times how you do things changes, and that paradigm becomes too big of a phenomenon to neglect. Thats where we are now.
You have to understand sometimes a trend or a phenomenon is so large that fighting it is pointless and some what resembles luddite behaviour. Not moving on with time is how you age out and get fired. Im not talking about a normal layoff, but more like becoming totally irrelevant to whatever that is happening in the industry at large.
That could totally be something that we encounter in the future, and perhaps we'll eventually be able to see some through line from here to there. Absolutely.
But in this thread, it sounds like you're trying to suggest we're already there or close to it, but when you get into the details, you (inadvertently?) admitted that we're still a long ways off.
The narrow-if-common examples you cited slow experienced people down rather than speed them up. They surely make some simple tasks more accessible to inexperienced people, just like in the translation app example, and there's value in that, but it represents a curious flux at the edges of the industry -- akin VBA or early PHP -- rather than a revolutionary one at the center.
It's impactful, but still quite far from a paradigm shift.
It writes blocks of 50 - 100 lines fairly fast in one go, and I work in such block chunks one at a time. And I keep going as I already have a good enough idea of how a program could look like. I can write some thing like a 10,000(100 such iterations) line Perl script in no time. Like in one/two working day(s).
Thats pretty fast.
If you are telling me I ask it to write the entire 20,000 line script in one go, that's not how I think. Or how I go about approaching anything in my life, let alone code.
To go a far distance, I go in cycles of small distances, and I go a lot of them.
I've spent a lot of time trying to make Aider and Continue do something useful, mainly with Qwen coder models. In my experience they suck. Maybe they can produce some scaffolding and boilerplate but I already have deterministic tools for that.
Recently I've tried to make them figure out an algorithm that can chug through a list of strings and collect certain ones, grouping lines with one pattern under the last one with another pattern in a new list. They consistently fail, and not in ways that are obvious at a glance. Fixing the code manually takes longer than just writing the code.
Usually it compiles and runs, but does the wrong thing. Sometimes they screw up recursion and only collect one string. Sometimes they add code for collecting the strings that are supposed to be grouped but don't use it, and as of yet it's consistently wrong.
They also insist on generating obtuse and wrong regex, sometimes mixing PCRE with some other scheme in the same expression, unless I make threats. I'm not sure how other people manage to synthesise code they feel good about, maybe that's something only the remote models can do that I for sure won't send my commercial projects to.
I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
As someone old enough to have built websites in Notepad.exe it's totally reasonable that I ask my teams to turn off syntax highlighting, brace matching, and refactoring tools in VSCode. I didn't have them when I started, so they shouldn't use them today. Modern IDE features are just making them lazy.
I think your mistaking programmer productivity with A.I that generates all the code for you allowing you to switch off your brain completely. prompt engineering code is not the same skill as programming and being good at it does not mean you actually understand how code or software works.
Change comes with pros and cons. The pros need to outweigh the cons (and probably significantly so) for change to be considered progress.
Syntax highlighting has the pro of making code faster to visually parse for most people at the expense of some CPU cycles and a 10 second setting change for people for whom color variations are problematic. It doesn't take anything away. It's purely additive.
AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.
My junior developers think I don't know they are using AI coding tools. I discovered it about 2 months into them doing it, and I've been tracking their productivity both before and after. In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time. Or at all. Even basic things have to be rewritten because they aren't suitable for purpose. And in our pair programming sessions, I see them frozen up now, where they weren't before they started using the tools. I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
I tried to use AI code generation once to fill in some ASP.NET Core boilerplate for setting up authentication. Should be basic stuff. Should be 3 or 4 lines of code. I've done it before, but I forgot the exact lines and had been told AI was good for this kind of lazy recall of common tasks. It gave me a stub that had a comment inside, "implement authentication here". Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation. And it still wasn't done. I haven't touched AI code gen since.
So IDK. I'm very skeptical of the claims that AI is writing significant amounts of working code for people, or that it at all rivals even a moderately smart junior developer (say nothing of actually experienced senior). I think what's really happening is that people are spending a lot of time spinning the roulette wheel, always betting on 00, and then crowing they're a genius when it finally lands.
AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.
At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.
I'm very skeptical of the claims that AI is writing significant amounts of working code for people
You may be right, but people write far too much code as it is. Software development should be about thinking more than typing. Maybe AI's most useful feature will be writing something that's slightly wrong in order to get devs to think about a good solution to their problem and then they can just fix the AI's code. If that results in better software then it's a huge win worth billions of dollars.
The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
> At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.
I am extremely skeptical that LLM-based generative AI running in silicon-based digital computers will improve to a significant degree over what we have today. Ever.
GPT-2 to GPT-3 was a sea change improvement, but every since then, new models are really only incrementally improving, despite taking exponentially more power and compute to train. Coupled with the fact that processor are only getting wider, not faster, or less energy consuming, then without an extreme change in computing technology, we aren't getting there with LLMs.
Either the processors or the underlying AI tech need to change, and there is no evidence this is the case.
> The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
I have no idea what you're even trying to say but this. Is this some kind of technoreligion that thinks AGI is worth the endeavor regardless of the harm that comes to people along the way?
I have no idea what you're even trying to say but this. Is this some kind of technoreligion that thinks AGI is worth the endeavor regardless of the harm that comes to people along the way?
I'm saying there is a lot of value in tools that are merely 'better than the status quo'. An AI assistant doesn't need to be as good as a dev in order to be useful.
Same. I've heard that this area is improving exponentially for years now, but I can't really say the results I'm getting are any better than I originally experienced with ChatGTP.
The dev tooling has gotten better; I use the integrated copilot every day and it saves me from writing a lot of boilerplate.
But it's not really replacing me as a coder. Yeah I can go further faster. I can fill in gaps in knowledge. Mostly, I don't have to spend hours on forums and stack overflow anymore trying to work out an issue. But it's not replacing me because I still have to make fine-grained decisions and corrections along the way.
To use an analogy, it's a car but not a self-driving one -- it augments my natural ability to super-human levels; but it's not autonomous, I still have to steer it quite a lot or else it'll run into oncoming traffic. So like a Tesla.
And like you I don't see how to get there from where we are. I think we're at a local maxima here.
To use an analogy, it's a car but not a self-driving one -- it augments my natural ability to super-human levels; but it's not autonomous, I still have to steer it quite a lot or else it'll run into oncoming traffic. So like a Tesla.
And like you I don't see how to get there from where we are. I think we're at a local maxima here.
To continue the car analogy - are you really suggesting we're at 'peak car'? You don't believe that cars in 20 years time are going to be significantly better than the cars we have today? That's very pessimistic.
There was a meme from back when - "if cars advanced like computers we'd be getting 1000 miles per gallon by now".
Thinking back to the car I had 20 years ago, it's not all that different from the car I have now.
Yes, the car I have now has a HUD, Carplay, Wireless iPhone charging, an iPhone app, adaptive cruise control, and can act as a wifi hotspot. But fundamentally it still does the same thing in the same way as 20 years ago. Even if we allow for EVs and Hybrid cars, it's still all mostly the same. Prius came out in 2000.
And now we've reached the point where computers advance like cars. We're writing code in the same languages, the same OS, the same instruction set, for the same chips as we did 20 years ago. Yes, we have new advancements like Rust, and new OSes like Android and iOS, and chipsets like ARM are big now. But iPhone/iPad/iMac, c/C++/Swift, OSX/MacOS/iOS, PowerPC/Intel/ARM.... fundamentally it's all homeomorphic - the same thing in different shapes. You take a programmer from the 70s and they will not be so out of place today. I feel like I'm channeling Bret Victor here: https://www.youtube.com/watch?v=gbHZNRda08o
And that's not for lack of advancements in languages, Os, instruction sets, and hardware architectures, it's for a lack of investment and commercialization. I can get infinite money right now to make another bullshit AI app, but no one wants to invest in an OS play. You'll hear 10000 excuses about how MS this and Linux that and it's not practical and impossible and there's no money in it, so on and so forth. The network effects are too high, and the in-group dynamic of keeping things the way they are is too strong.
But AGI? Now that's something investors find totally rational and logical and right around the corner. "I will make a fleet of robot taxis that can drive themselves with a camera" will get you access to unlimited wallets filled with endless cash. "I will advance operating systems past where they have been languishing for 40 years" is tumbleweeds.
The cars we have today are not so far off from the cars of 100 years ago. So yes, I highly doubt another 20 years of development, after all the low hanging fruit has already been picked, will see much change at all.
>>In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time.
Most people are using it to finish work soon, rather than use it to do more work. As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.
>>I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent. Things are only getting easier with time and have been like this for a few centuries. None of this is wrong.
>>Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation.
Honestly this largely reads like how my dad would describe technology from the 2000s. It was always that he was better off without it. Whether that was true or false is up for debate, but the world was moving on.
> As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.
I think you just hit the core point that splits people in these discussions.
For many senior engineers, we see our jobs are to build better and more lasting products. Correctness, robustness, maintainability, consistency, clarity, efficiency, extensibility, adaptability. We're trying to build things that best serve our users, outperform our competition, enable effective maintenance, and include the design foresight that lets our efforts turn on a dime when conditions change while maintaining all these other benefits.
I have never considered myself striving towards "newer and bigger" projects and I don't think any of the people I choose to work with would be able to say that either. What kind of goal is that? At best, that sounds like the prototyping effort of a confused startup that's desperately looking to catch a wave it might ride, and at worst it sounds like spam.
I assure you, little of the software that you appreciate in your life has been built by senior engineers with that vision. It might have had some people involved at some stage who pushed for it, because that sort of vision can effectively kick a struggling project out of a local minimum (albeit sometimes to worse places), but it's unlikely to have been a seasoned "senior engineer" being the one making that push and (if they were) they surely weren't wearing that particular hat in doing so.
I don't get this idea that to build a stable product you must make your life hard as much as possible.
One can use ai AND build stable products at the same time. These are not exactly opposing goals, and even above that assuming that ai will always generate bad code itself is wrong.
Very likely people will build both stable and large products using ai than ever before.
I understand and empathise with you, moving on is hard, especially when these kind of huge paradigm changing events arrive, especially when you are no longer in the upswing of life. But the arguments you are making are very similar to those made by boomers about desktops, internet and even mobile phones. People have argued endlessly how the old way was better, but things only get better with newer technology that automates more things than ever before.
I don't feel like you read my comment in context. It was quite specifically responding to the GP's point of pursuing "bigger and better" software, which just isn't something more senior engineers would claim to pursue.
I completely agree with you that "one can use ai AND build stable products at the same time", even in the context of the conversation we're having in the other reply chain.
But I think we greatly disagree about having encountered a "paradigm changing event" yet. As you can see throughout the comments here, many senior engineers recognize the tools we've seen so far for what they are, they've explored their capabilities, and they've come to understand where they fit into the work they do. And they're simply not compelling for many of us yet. They don't work for the problems we'd need them to work for yet, and are often found to be clumsy and anti-productive for the problems they can address.
It's cute and dramatic to talk about "moving on is hard" and "luddism" and some emotional reaction to a big scary immanent threat, but you're mostly talking to exceedingly practical and often very-lazy people who are always looking for tools to make their work more effective. Broadly, we're open to and even excited about tools that could be revolutionary and paradigm changing and many of us even spend our days trying to discover build those tools. A more accurate read of what they're saying in these conversations is that we're disappointed with these and in many cases and just find that they don't nearly deliver on their promise yet.
> life getting easier for the young isnt exactly something we must resent.
I don't see how AI is making life easier for my developers. You seem to have missed the point that I have seen no external sign of them being more productive. I don't care if they feel more productive. The end result is they aren't. But it does seem to be making life harder for them because they can't seem to think for themselves anymore.
> a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products
Well then, we're in agreement. I should reveal to my juniors that I know they are using AI and that they should stop immediately.
> I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent.
Of course not. But, eventually these young people are going to take over the systems that were built and understood by their ancestors. History shows what damage can be caused to a system when people who don't fully understand and appreciate how it was built take it over. We have to prepare them with the necessary knowledge to take over the future, which includes all the warts and shit piles.
I mean, we've spent a lot of our careers trying to dig ourselves out of these shit piles because they suck so bad, but we never got rid of them, we just hid them behind some polish. But it's all still there, and vibe coders aren't going to be equipped to deal with it.
Maybe the hope is that AI will reach god-like status and jut fix all of this for us magically one day, but that sounds like fixing social policy by waiting for the rapture, so we have to do something else to assure the future.
Funny / sad. GP is just highlighting the all too common attitude of people who grew up using new tech (graphing calculators, Wikipedia, etc) who reach a certain age and suddenly new tech is ruining the youth of today.
It’s just human nature, you can decide if it’s funny or sad or whatever
Neither of you have comprehended the part of my post where I talk about myself and my own skills.
Hiding behind the sarcasm tag to take the piss out of people younger than you, I don't think that's very funny. The magnetised needle and a steady hand gag from xkcd, now that is actually funny.
If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong.
>>I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
When I first started work, my employer didn't provide internet access to employees, their argument would always be how would you code if there was no internet connection, out there in the real world? , As it turns out they were not only worried about the wrong problem, but the got the whole paradigm about this new world wrong.
In short it was not worth building anything at all in a world internet doesn't exist.
>>then one day it's not cheap ...
Again you are worried about the wrong thing, your worry should not be what happens when its no longer cheaper, but when it, as a matter of fact gets cheaper. Which it will.
> If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong
Then what value are they actually adding?
If this is all they are capable of, surely you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
This is how I feel. I mentioned this to a couple of friends over a beer and their answer was that there are many not "decently competent programmer"s in the industry currently and they benefit immensely from this technology, at the expense of the stability and maintainability of the system they are working on.
Albeit they are fairly context aware as to what you are asking. So they can save a lot of RTFM and code/test cycles. At times they can look at the functions that are already built, and write new ones for you, if you can begin to describe the function well.
But if you want to write a good function, like written to fit tightly to specifications. Its too much English. You need to describe in steps what is to be done, plus exceptions. And at some point you are just doing logic programming(https://en.wikipedia.org/wiki/Logic_programming) In the sense that whole english text looks like a list of and/or situations + exceptions.
So you have to go one atomic step(a decision statement and a loop) at a time. But thats a big productivity boost too. Reason being able to put lots of text in place without you having to manually type it out.
>>you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
Honestly speaking most of coding is manually laborious if you don't know touch typing. And even if you did know its a chore.
I remember when I started using co-pilot with react it was doing a lot of otherwise typing work I'd have to do.
>>I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
IMO opinion, my brain atleast over the years has seen so many code patterns, debugging situations and what to anticipate and assemble as I go, that having some intelligent typing assistant is a major productivity boost.
>>Why are people so bullish on them?
Eventually newer programming languages will come along and people will build larger things.
Honestly, a lot of the problems people have with programming that they use AI to solve can be solved with better language design and dev tools.
For example, I like LLMs because they take care of a lot of the boilerplate I have to write.
But I only have to write that boilerplate because it's part of the language design. Advances in syntax and programming systems can yield similar speedups in programming ability. I've seen a 100x boost in productivity that came down to switching to a DSL versus C++.
Maybe we need more DSLs, better programming systems, better debugging tools, and we don't really need LLMs the way LLM makers are telling us? LLMs only seem so great because our computer architecture, languages and dev tooling and hardware are stuck in the past.
Instead of being happy with the Von Neumann architecture, we should be exploring highly parallel computer architectures.
Instead of being happy with imperative languages, we should be investing heavily in studying other programming systems and new paradigms.
Instead of being happy coding in a 1D text buffer, we should be investing more in completely imaginative ways of building programs in AR, VR, 3D, 2D.
LLMs are going to play a part here, but I think really they are a band-aid to a larger problem, which is that we've climbed too high in one particular direction (von-neuman/imperative/text) and we are at a local maxima. We've been there since 2019 maybe.
There are many other promising peaks to climb, avenues of research that were discovered in the 60s/70s/80s/90s have been left to atrophy the past 30 years as the people who were investigating those paths refocused or are now gone.
I think all these billions invested in AI are going to vaporize, and maybe then investors will focus back on the fundamentals.
LLMs are like the antenna at the top of the Empire State Building. Yes, you can keep going up if you climb up there, but it's unstable and eventually there really is a hard limit.
If we want to go higher that that, we need to build a wider and deeper foundation first.
Cursor's current business model produces a fundamental conflict between the well-being of the user and the financial well-being of the company. We're starting to see these cracks form as LLM providers are relying on scaling through inference-time compute.
Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
If you prune out context from the initial prompt, instead of reasoning on richer context, the llm reasons only on the prompt itself (w/ no access to the attached files). After the thinking process, Cursor runs function calls to retrieve more context, which entirely defeats the point of "thinking" and induces the model to create incoherent plans and speculative edits in its thinking process, thus explaining Claude's bizarre over-editing behavior. I suspect this is why so many Cursor users are complaining about Claude 3.7.
On top of this, Cursor has every incentive to keep the thinking effort for both o3-mini and Claude 3.7 to the very minimum so as to reduce server load.
Cursor is being hailed as one of the greatest SAAS growth stories but their $20/mo all-you-can-eat business model puts them in such a bad place.
>Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. While that seems like a perfectly reasonable strategy, it starts to fall apart when integrating reasoning models.
In general I feel like this was always the reason automatic context detection could not be good in fixed fee subscription models - providers need to constrain the context to stay profitable. I also saw that things like Claude Code happily chew through your codebase, and bank account, since they are charging by token - so they have the opposite incentive.
> This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
Keep in mind that what we call "reasoning" models today are the first iteration. There's no fundamental reason why you can't do what you stated. It's not done now, but it can be done.
There's nothing stoping you from running "tinking" in "chunks" of 1-2 paragraphs, doing some search, and adding more context (maybe from pre-reasoned cache) and continuing the reasoning from there.
There's also work being done on think - summarise - think - summarise - etc. And on various "RAG"-like thinking.
This is only surface-level deep. Cursor already has Quotas for their paid plans and Usage-based Pricing for their larger models, which I run into and fall over to their usage based model every month.
Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.
If you’re going to pay on the margin, why not use those incremental dollars running the same requests on cline? I’m assuming cost is the deciding factor here because, quality-wise, plugging directly into provider apis with cline always does a much better job for me.
> Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
There's nothing about this that conflicts with reasoning models, I'm not sure what you mean here.
what i mean is that their implementation (thinking only on the first response) renders zero benefit because it doesn’t see the code itself. They run multiple function calls to analyze your codebase in increments. If they ran the thinking model on the output of those function calls, then performance would be great but, so far, this is not what they are doing (yet). It also dramatically increases the cost of running the same operation.
But the way those models work is to run everything once the function calls come in. Are you saying cursor is not using the model you selected on function calls responses?
That's my point. Cursor, by offering unlimited requests (500 fast requests + unlimited slow requests) to people paying a fixed $20/mo, they've put themselves into a ruthless marginal cost optimization game where one of their biggest levers for success is reducing context sizes and discouraging thinking after every function call.
Software like Claude Code and Cline do not face those constraints, as the cost burden is on the user.
Reflecting on your comment I realized that using a huge amount of GPUs is akin to an Turing machine approaching infinite speed. So I think the promise of LLMs writing code is basically saying: if we add a huge number of reading/writing heads with unbounded number of rules, we can solve decideability. Because what is the ability to generate arbitrarily complex code if not solving the halting problem? Maybe there's a more elegant or logical way to postulate this, or maybe I'm just confused or plain wrong, but it seems to me that it is impossible to generate a program that is guaranteed to terminate unless you can solve decideability. And throwing GPUs at a huge tape is just saying that the tape approaches infinite size and the Turing machine approaches infinite speed...
Or put another way, isn't the promise of software that is capable to generate any software given a natural language description in finite time basically assuming P=NP? Because unless the time can be guaranteed to be finite, throwing GPU farms and memory at this most general problem (isn't the promise of using software to generating arbitrary software the same as the promise that any possible problem can be solved in polynomial time?) is not guaranteed to solve it in finite time.
> Cursor has been trying to do things to reduce the costs of inference, especially through context pruning.
You can also use cline with gemini-2.0-flash, which supports a huge context window. Cline will send it the full context and not prune via RAG, which helps.
Which local model would you recommend that comes close to cursor in response quality? I have tried deepseek, mistral, and few others. None comes close to quality of cursor. I keep coming back to it.
See, the thing is, I never wanted to comment on the cost or feasibility of the hardware at all. What I was commenting was that any backup plan is expected to be subpar by very nature, and if not, shouldnbe instantly promoted. If you'll notice that was 100% of what I said. I was adding to the pile of "this plan is stupid". Cursor has an actual value proposition.
Of course then you disrespected me with a rude ad hominem and got a rude response back. Ignoring the point and attacking the persin is a concession. M
For the record, I and many others use throwaways wvery single thread. This isn't and shouldn't be reddit.
You're right, I shouldn't have said the throwaway bit, sorry. However, you're ignoring the context of the conversation, which is a $10k piece of hardware. I don't know what you expected to add to the conversation by saying "who cares?" when someone asks for advice, in context or even in isolation.
wrong. where the user is asking for recommended models (for offline use), they’re not saying “yes in fact I will burn $10000 on a computer”, not at all lol
Trying to do that with an M1 laptop of 32 GB and it's hard to get even 1000 euro's for it in the Netherlands whereas the refurbished price is at double of that.
Maybe this "backup" solution.. developed into commodity hardware as an affordable open source solution that keeps the model and code locally and private at all times is the actual solution we need.
Lets say a cluster of raspberry pi's / low powered devices producing results as good as claude 3.7-sonnet. Would it be completely infeasible to create a custom model that is trained on your own code base and might not be a fully fledged LLM but provides similar features to cursor?
Have we all gone bonkers sending our code to third parties? The code is the thing you want to keep secret unless your working on an open source project.
The UX of tools like these is largely constrained by how good they are with constructing a complete context of what you are trying to do. Micromanaging context can be frustrating.
I played with aider a few days ago. Pretty frustrating experience. It kept telling me to "add files" that are in the damn directory that I opened it in. "Add them yourself" was my response. Didn't work; it couldn't do it somehow. Probably once you dial that in, it starts working better. But I had a rough time with it creating commits with broken code, not picking up manual file changes, etc. It all felt a bit flaky and brittle. Half the problem seems to be simple cache coherence issues and me having to tell it things that it should be figuring out by itself.
The model quality seems less important than the plumbing to get the full context to the AI. And since large context windows are expensive, a lot of these tools are cutting corners all the time.
I think that's a short term problem. Not cutting those corners is valuable enough that a logical end state is tools that don't do that that cost a bit more. Just load the whole project. Yes it will make every question cost 2-3$ or something like that. That's expensive now but if it drops by 20x we won't care.
Basically large models that support huge context windows of millions/tens of millions of tokens cost something like the price of a small car and use a lot of energy. That's OK. Lots of people own small cars. Because they are kind of useful. AIs that have a complete, detailed context of all your code, requirements, intentions, etc. will be able to do a much better job that one that has to guess all of that from a few lines of text. That would be useful. And valuable to a lot of people.
Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
> aider [...] It kept telling me to "add files" that are in the damn directory that I opened it in.
That's intentional, and I like it. It limits the context dynamically to what is necessary (of course it makes mistakes). You can also add files with placeholders and in a number of other ways. but most of the time I let Aider decide. It has a repomap (https://aider.chat/docs/repomap.html), gradually building up knowledge and makes proposals based on this and other information it gathered also with token costs and out-of-context-window in mind.
As for manual changes: aider is opinionated regarding the role of Git in your workflow. At first glance, this repels some people and some stick to this opinion. For others, it is exactly one of the advantages, especially in combination with the shell-like nature of the tool. But the standard Git handling can still be overridden. For me personally, the default behavior becomes more and more smooth and second nature. And the whole thing is scriptable, I only begin to use the possibilities.
In general: Tools have to be learned, impatient one-shot attempts are simply not enough anymore.
> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
OTOH currently the LLM companies are probably taking a financial loss with each token. Wouldn't be surprised if the price doesn't even cover the electricity used in some cases.
Also e.g. Gemini already runs on Google's custom hardware, skipping the Nvidia tax.
> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
That still leaves us with an ungodly amount of resources used both to build the GPUs and to run them for a few years before having to replace them with even more GPUs.
Its pretty amazing to me how quickly the big tech companies pivoted from making promises to "go green" to buying as many GPUs as possible to burn through entire powerplants worth of electricity.
Try Claude Code. It figures out context by itself. I’m having a lot of success with it for a few days now, whereas I never caught on with Cursor due to the context problem.
This is a simple code assistant that doesn't get in your way and makes sure you are coding (not losing your ability to program).
You configure a replicate API token from replicate... install the tool and point it at your code base.
When you save a file it asks the LLM for advise and feedback on the file as a "senior developer".
Run this along side your favorite editor to get feedback from an LLM as your working on (open source code nothing you don't want third parties to see).
You are still programming and using your brain but you have some feedback when you save files.
The feedback is less computationally expensive or fraught with difficulty than actually getting code from LLM's so it should work with much less powerful models.
It would be nice if there was a search built in so it could search for useful documentation for you.
I've found tools like Cursor useful for prototyping and MVP development. However, as the codebase grows, they struggle. It's likely due to larger files or an increased number of them filling up the context window, leading to coherence issues. What once gave you a speed boost now starts to work against you. In such cases, manually selecting relevant files or snippets from them yields better results, but at that point it's not much different from using the web interface to something like Claude.
I had that same experience with Claude Code. I tried to do a 95% "Idle Development RPG" approach to developing a music release organization software. At the beginning, I was really impressed, but with more and more complexity, it becomes increasingly incoherent, forgetting about approaches and patterns used elsewhere and reinventing the wheel, often badly.
Or the context not being large enough for all the obscure functions and files to go into the context. I am too basic to have dug deep enough, but a simple (automatic) documentation context for the entire project would certainly improve things for me.
Agreed. One useful tip is to have Cursor break up large files into smaller files. For some reason, the model doesn't do this naturally. I've had several Cursor experiments grow into 3000+ line files because it just keeps adding.
Once the codebase is reasonably structured, it's much better at picking which files it needs to read in.
I've tried Cursor a couple of times but my complain is always the same: why forking VS Code when all this functionality could just be an extension, same as Copilot does?
Some VSCode extensions don't work, you need to redo all your configuration, add all your workspaces... and the gain vs Copilot is not that high
> why forking VS Code when all this functionality could just be an extension, same as Copilot does?
Have you programmed extensions for VSCode before? While it seems like a fairy extensible system overall, the editor component in particular is very restrictive. You can add text (that's what extensions like ErrorLens and GitLens are doing), inlay hints, and on-hover popup overlays (those can only trigger on words, and not on punctuation). What Cursor does: the automatic diff-like views of AI suggestions with graphic outlines, floating buttons, and whatnot right on top of the text editing view - is not possible in vanilla VSCode.
This was originally driven by necessity of tighter control over editor performance. In its early days VSCode was competing with Atom - another extensible JS-powered editor from GitHub, and while Atom had an early lead due to larger extensions catalog VSCode ultimately won the race because they manged to maintain lower latency of their text editor component. Nowadays they still don't want to introduce extra extension points to it, because newer faster editors pop out all the time, too.
By trying things and seeing what it’s good and bad at. For example, I no longer let it make data modelling decisions (both for client local data and database schemas), because it had a habit of coding itself into holes it had trouble getting back out of, eg duplicating data that it then has difficulty keeping in sync, where a better model from the start might have been a more normalised structure.
But I came to this conclusion by first letting it try to do everything and observing where it fell down.
> And then at the top of the file, just write some text about what the project is about. If you have a particular file structure and way of organising code that is great to put in as well.
By asking the AI to generate a context.md file, you get an automatically structured overview of the project, including its purpose, file organization, and key components. This makes it easier to onboard new contributors, including other LLMs.
Compounding the opinions of other commentors, I feel that using Cursor is a bad idea. It's a closed source SaaS, and with these components involved, service quality can do wild swings on a daily basis, not something I'm particularly keen of.
For those of you who, like me, use Neovim, you can achieve "cursor at home" by using a plugin like Avante.nvim or CodeCompanion. You can configure it to suit your preferences.
Just sharing this because I think some might find it useful.
- Push for DRY principles ("make code concise," "ensure good design").
- Swap models strategically; sometimes it's beneficial to design with one model and implement with another. For example, use DeepSeek R1 for planning and Claude 3.5 (or 3.7) for execution. GPT-4.5 excels at solving complex problems that other models struggle with, but it's expensive.
- Insist on proper typing; clear, well-typed code improves autocompletion and static analysis.
- Certain models, particularly Claude 3.7, overly favor nested conditionals and defensive programming. They frequently introduce nullable arguments or union types unnecessarily. To mitigate this, keep function signatures as simple and clean as possible, and validate inputs once at the entry point rather than repeatedly in deeper layers.
- Emphasize proper exception handling. Some models (again, notably Claude 3.7) have a habit of wrapping everything in extensive try/catch blocks, resulting in nested and hard-to-debug code reminiscent of legacy JavaScript, where undefined values silently pass through multiple abstraction layers. Allowing code to fail explicitly is a blessing for debugging purposes; masking errors is like replacing a fuse with a nail.
Some additional thoughts on GPT-4.5: it provides BFK-9k experience - eats e̶n̶e̶r̶g̶y̶ ̶c̶e̶l̶l̶s̶ budget ($2 per call!) like there is no tomorrow, but removes bugs with a blast.
In my experience, the gap between Claude 3.7 and GPT-4.5 is substantial. Claude 3.7 behaves like an overzealous intern on stimulants. It delivers results but often includes unwanted code changes, resulting in spaghetti code with deeply nested conditionals and redundant null checks. Although initial results might appear functional, the resulting technical debt makes subsequent modifications increasingly difficult, often leaving the codebase in disarray. GPT-4.5 behaves more like a mid-level developer, thoughtfully applying good programming patterns.
Unfortunately, the cost difference is significant. For practical purposes, I typically combine models. GPT-4.5 is generally reserved for planning, complex bug fixes, and code refinement or refactoring.
In my experience, GPT-4.5 consistently outperforms thinking models like o1. Occasionally, I'll use o3-mini or DeepSeek R1, but GPT-4.5 tends to be noticeably superior (at least, on average). Of course, effectiveness depends heavily on prompts and specific problems. GPT-4.5 often possesses direct knowledge about particular libraries (even without web searching), whereas o3-mini frequently struggles without additional context.
Sometimes I could solve in 15 mins, a bug I had been chasing for days. In other cases, it is simpler to write codes by hand - as AI either does not solve a problem (even a simple one), or does, but at a cost of tech debt - or it takes longer than doing things manually.
AI is just one more tool in our arsenal. It is up to us to decide when to use them. Just because we have a hammer does not mean we need to use it for screws.
> Wouldn’t it be easier instead of juggling with [something] and their quirks to just write the code the old way?
This phrase, when taken religiously, would keep us writing purely in assembly - as there is always "why this new language", "why this framework", "why LLMs".
I have been a religious Cursor + Sonnet user for like past half a year, and maybe I'm an idiot, but I don't like this agentic workflow at all.
What worked for me is having it generate functions, classes, ranging from tens of lines of code to low hundreds. That way I could quickly interate on its output and check if its actually what I wanted.
It created a prompt-check-prompt iterative workflow where I could make progress quite fast and be reasonably certain of getting what I wanted. Sometimes it required fiddling with manually including files in the context, but that was a sacrifice I was willing to make and if I messed up, I could quickly try again.
With these agentic workflows, and thinking models I'm at a loss.
To take advantage of them, you need very long and detailed prompts, they take a long time to generate and drop huge chunks of code on your head. What it generates is usually wrong due to the combination of sloppy or ambiguous requirements by me, model weaknesses, and agent issues. So I need to take a good chunk of time to actually understand what it made, and fix it.
The iteration time is longer, I have less control over what it's doing, which means I spend many minutes of crafting elaborate prompts, reading the convoluted and large output, figuring out what's wrong with it, either fixing it by hand, or modifying my prompt, rinse and repeat.
TLDR: Agents and reasoning models generate 10x as much code, that you have to spend 10x time reviewing and 10x as much time crafting a good prompt.
In theory it would come out as a wash, in practice, it's worse since the super-productive tight AI iteration cycle is gone.
Overall I haven't found these thinking models to be that good for coding, other than the initial project setup and scaffolding.
I think you’re absolutely right and I’ve come to the same conclusion and workflow.
I work on one file at a time in Ask mode, not Composer/Agent. Review every change, and insist on revisions for anything that seems off. Stay in control of the process, and write manually whenever it would be quicker. I won’t accept code I don’t understand, so when exploring new domains I’ll go back with as many questions as necessary to get into the details.
I think Cursor started off this way as a productivity tool for developers, but a lot of Composer/Agent features were added along the way as it became very popular with Vibe Coders. There are inherent risks with non-coders copypasting a load of code they don’t understand, so I see this use case as okay for disposable software, or perhaps UI concept prototypes. But for things that matter and need to be maintained, I think your approach is spot on.
Have you found that this still saves you time overall? Or do you spent a similar amount of time acting as a code reviewer rather than coding it yourself?
Do you have any Cursor rules defined? Those tend to control its habit of trying to go off the rails and solve 42 problems at once instead of just the one.
Do any of these tools use the rich information from the AST to pull in context? Coupled with semantic search for entry points into the AST, it feels like you could do a lot…
AI blows me away when asked to write greenfield code. It can get a complex task using hundreds of lines of code right on the first try or perhaps it needs a second try on the prompt and an additional tweak of the output code.
As things move from prototype to production ready the productivity starts to become a wash for me.
AI doesn’t do a good job organizing the code and keeping it DRY. Then it’s not easy for it to make those refactorings later. AI is good at writing code that isn’t inherently defective but if there is complexity in the code it will introduce bugs in its changes.
I use Continue for small additions and tab completions and Claude for large changes. The tab completions are a small productivity boost.
Nice to see these tips- I will start experimenting with prompts to produce better code.
parts of the article are spot on. after the magic has worn off i find it's best to literally treat it like another person. would you blindly merge code from someone else or huge swaths of features? no. i have to review every single piece of code, because later on when there's a bug or new feature you have to have that understanding.
another huge thing for me has been to scaffold a complex feature just to see what it would do. just start out with literal garbage and an idea and as long as it works you can start to see if something is going to pan out or not. then tear it down and do it again with those new assumptions you learned. keep doing it until you have a clear direction.
or sometimes my brain just needs to take a break and i'll work on boilerplate stuff that i've been meaning to do or small refactors.
How does the current state of Cursor agentic workflow compare to Windsurf Editor?
I've been using Windsurf since it was released, and back then, it was so ahead of Cursor it's not even funny. Windsurf feels like it's trained on good programming practices (check usage of the function in other parts of the project for consistency, double checking for errors after changes made, etc). It's also surprisingly fast (it can "search" the 5k files codebase in, like, 2 seconds. It even asked me once to copy and paste output from Chrome DevTools because it suspected that my interpretation of the result was not accurate (and it was right).
The only thing I truly wish is to have the same experience with locally running models. Perhaps Mac Studio 512GB will deliver :)
To be honest, I switched from Cursor to Windsurf precisely because of how much less of a credits it uses. Even using daily, I couldn't even remotely hit the limits of the credits in Windsurf. Well, initially they didn't even show how many credits I'm using :), now it's more visible, but still for 10$ per month I still can't hit the limits and I'm not restricting myself (not abusing either).
I saw this post on the first page a few minutes ago (published 5 hours ago), but it quickly dropped to the 5th page. Given its comments and points, that seems odd. I had to search to find it again. Any idea why?
What programming languages do you primarily use ?
I feel that knowing what programming languages
a llm is best at is valuable but often not
directly apparent.
> Like mine will keep forgetting about nullish coallescing (??) in JS, and even after I fix it up it will revert my change in its future changes. So of course I put that rule in and it won't happen again.
I'm surprised that this sort of pattern - you fix a bug and the AI undoes your fix - is common enough for the author to call it out. I would have assumed the model wouldn't be aggressively editing existing working code like that.
Yeah I have seen this a bunch of times as well. Especially with deprecated function calls. It generates a bunch of code. I get deprecation warnings. I fix them. Copilot fixes them back. I have to explicitly point out that I made the change for it to not reintroduce the deprecations.
I guess that while code that compiles is easier to train for but code with warnings less so?
I remember there are other examples of changes that I have to tell the AI I made to not have it change it back again, but can't remember any specific examples.
It’s due to a problem with Cursor not updating the state of the files that have been manually edited since the last time they were used in the chat, so it’ll thing the fix is not there and blindly output code that doesn’t have it. The ‘apply’ model is dumb, so it just overwrites the corrected version with the wrong one.
I think the changelog said they fixed it in 0.46, but that’s clearly not the case.
Yep I asked about this exact problem the other day: https://news.ycombinator.com/item?id=43308153 Having something like “always read the current version of the file before suggesting edits” in Cursor rules doesn’t help, the current file is only read by the agent sometimes. Guess no one has a reliable solution yet.
Cursor in agent mode + Sonnet 3.7 love nothing better than rewriting half your codebase to fix one small bug in a component.
I've stopped using agent unless its for a POC where I just want to test an assumption. Applying each step takes a bit more time but means less rogue behaviour and better long term results IME.
Reminds me of my old co-worker who rewrote our code to be 10x faster but 100x more unreadable. AI agent code is often the worst of both of those worlds. I'm going to give [0] this guy's strategy a shot.
If you stopped using agent mode, why use Cursor at all and not a simple plugin for VSCode? Or is there something else that Cursor can do, but a VSCode plugin can't?
I'm sorry, but isn't Cursor just an editor? Maybe an editor shouldn't actually have garbage parts to avoid?
Why not just use an editor that is focused on coding, and then just not use an LLM at all? Less fighting the tooling, more getting your job done with less long term landmines.
There are a lot of editors, and many of them even have native or semi-native LLM support now. Pick one.
Edit: Also, side note, why are so many people running their LLMs in the cloud? All the cutting edge models are open weight licensed, and run locally. You don't need to depend on some corporation that will inevitably rug-pull you.
Like, a 7900XTX runs you about $1000. You probably already own a GPU that cost more in your gaming rig.
> Edit: Also, side note, why are so many people running their LLMs in the cloud? All the cutting edge models are open weight licensed, and run locally. You don't need to depend on some corporation that will inevitably rug-pull you.
???
Deepseek R1 doesn't run locally unless you program on a dual socket server with 1 TB of RAM. Or enough cash to have a cabinet of GPUs. The trend for state-of-the-art LLMs is to get bigger over time, not smaller.
Look, I've played with llava and llama locally too, but the benchmarked performance is nowhere near what you can get from the larger cloud providers who can serve hundred-million+ parameter models without quantization.
You wouldn't use full fledged R1 for coding. There are distilled models using R1 for coding that get you most of the way there. R1 also doesn't take 1TB of RAM, go use read Unsloth's writeup on how to reduce model size without reducing quality (they got it to fit into 131GB): https://unsloth.ai/blog/deepseekr1-dynamic tl;dr parameter count is where the statistical model lives or dies, not weight precision; you can't blindly shrink every weight, and tooling is learning how to not butcher models.
Also, performance between cloud-ran models and models I've ran locally with llama.cpp seem to be actually pretty similar. Are you sure your model didn't fit into your VRAM, or something else may have been misconfigured? Not fitting into VRAM slows everything to a halt. All the coder models that are worth looking at fit into 24GB cards in their full sized variants with the right quantization.
Yes, I'm aware of various rankings. Try all of those models on something that isn't commonly used on a benchmark, and you'll notice that a lot of the proprietary models have trouble actually producing statistically relevant results.
The only one that I've come across that makes me think LLMs will maybe be useful someday is Deepseek R1 and the redistillations based on it.
I've seen HN's fascination with OpenAI's products, and I can't understand why. Even O1 and O3, they're always too little too late, somebody else already is doing something better and throwing it into a HF repo. Must be the Silicon Valley RDF at work.
Cursor overwrites the “code” command line shortcut/alias that’s normally set by VS Code. It does this on every update with no setting to disable this behavior. There are numbers of forum threads asking about manual solutions. This seems like a deliberately anti-user feature meant to get their usage numbers up at all costs. This small thing makes me not trust the decision making process at Cursor won’t sell me out as a user.
I tried cursor for a day or two and then asked for a refund... here's why:
* It has terrible support for Elixir (my fav language) because the models are only really trained on python.
* Terrible clunky interface... it would be nice if you didn't have to click around, do modifier ctrl + Y stuff ALL the time.
* The code generated is still riddled with errors or naff (apart from boiler plate)... so I am still * prompt engineering * the crap out of it.. which I'm good at but I can prompt engineer using phind.com...
* The fact that the code is largely broken first time and they still haven't really fixed the context window problem means you have to copy paste error codes back into it.. defeating the purpose of an in integrated IDE imo.
* The free demo mode stops working after generating one function... if I had been given more time to evaluate it fully I would never have signed up. I signed up to see if it was any good.. which it isn't.
Too bad they removed the ability to use Chat (rebranded as Ask) with your own API keys in version 0.47. Now every feature requires a subscription.
Natural for Cursor to nudge users towards their paid plans, but why provide the ability to use your own API keys in the first place if you're going to make them useless later?
nice also you can use project-specific structure and markdown files to ensure the AI organizes content correctly for your use case. we are using it on 800k lines of golang and it works well. https://getstream.io/blog/cursor-ai-large-projects/
Eng leadership at my place are pushing Cursor pretty hard. It's great for banging out small tickets and improving the product incrementally kaizen-style, but it falls down with anything heavy.
I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times. I think may be doing the same to me too.
Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.
As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
I’ve been a paying cursor user for 4-5 months now and feeling the same. A lot more mistakes leaking into my PRs. I feel a lot faster but there’s been a noticeable decrease in the quality of my work.
Obviously I could just better review my own code, but that’s proving easier said than done to the point where I’m considering going back to vanilla Code.
Same result - I tried it for a while out of curiosity but the improvements were a false economy: time saved in one PR is time lost to unplanned work afterwards. And it is hard to spot the mistakes because they can be quite subtle, especially if you've got it generating boilerplate or mocks in your tests.
Makes you look more efficient but it doesn't make you more effective. At best you're just taking extra time to verify the LLM didn't make shit up, often by... well, looking at the docs or the source.. which is what you'd do writing hand-crafted code lol.
I'm switching back to emacs and looking at other ways I can integrate AI capabilities without losing my mental acuity.
Can you elaborate on the mistakes you see? What languages are you working with?
Just your run-of-the-mill hallucinations, e.g. mocking something in pytest but only realising afterwards that the mock was hallucinated, the test was based on the mock, and so the real behaviour was never covered.
I mean, I generally avoid using mocks in tests for that exact reason, but if you expect your AI completions to always be wrong you wouldn't use them in the first place.
Beyond that, the tab completion is sometimes too eager and gets in the way of actually editing, and is particularly painful when writing up a README where it will keep suggesting completely irrelevant things. It's not for me.
> the tab completion is sometimes too eager and gets in the way of actually editing
Yea, this is super annoying. The tab button was already overloaded between built-in intellisense stuff and actually wanting to insert tabs/spaces, now there are 3 things competing for it.
I'll often just want to insert a tab, and end up with some random hallucination getting inserted somewhere else in the file.
> Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.
> As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
If it gets too expensive, then I guess the alternative becomes using something like Continue.dev or Cline with one of the providers like Scaleway that you can rent GPUs from or that have managed inference… either that, or having a pair of L4 cards in a closet somewhere (or a fancy Mac, or anything else with a decent amount of memory).
Whereas if there are no well priced options anywhere (e.g. the upfront investment for a company to buy their own GPUs to run with Ollama or something else), then that just means that running LLM based systems nowadays is economically infeasible for many.
What do you consider "heavy"? Is it optimising an algorithm or "rewrite this whole codebase in <a different language>"?
Refactoring a critical function that is very large and complex. Yeah, maybe it shouldn't be so large and so complex, but it is and that's the codebase we have. I'm sure many other companies do too.
Thats not how apex productivity folks have used any IDE productivity leap including this one.
You dont outsource your thinking to the tool, You do the thinking and let the tool type it for you.
You're missing the step where I have to articulate (and type) the prompt in natural language well enough for the tool to understand and execute. Then if I fail, I have to write more natural language.
You said just the same in another of your posts:
> if you can begin to describe the function well
So I have to learn how to describe code rather than just writing it as I've done for years?
Like I said, it's good for the small stuff, but bad for the big stuff, for now at least.
> So I have to learn how to describe code rather than just writing it as I've done for years?
If we keep going down this path, we might end up inventing artificial languages for the purpose of precisely and unambiguously describing a program to a computer.
You mean logic programming? https://en.wikipedia.org/wiki/Logic_programming
Its been around for decades. In fact this was the first approach to doing AI.
In logic programming you basically write concrete set of test cases and the compiler generates the code for which the test cases hold 'true'.
In other words you get a language to 'precisely and unambiguously describe a program', as you said. Compiler writes the code for you.
Sad that GP's wit was lost on you.
Exactly my point, you don't ask the LLM to give you the whole function. That would be too much English work, because that means you need to write down the contract of the function in concrete and/or list(list of lists).
You ask it to give you one block at a time.
iterate over the above list and remove all strings matching 'apple'
open file and write the above list etc etc kind of stuff.
Notice how the English here can be interpreted only way, but the LLM is now a good intelligent coding assistant.
>>I think in code, I'd rather just write in code.
Continue to think, just make the LLM type out the outcome of your ideas.
I'm sure you have a great point to make, but you're making it very poorly here.
Experienced developers develop fluency in their tools, such that writing such narrow natural language directives like you suggest is grossly regressive. Often, the tasks don't even go through our head in English like that, they simply flow from the fingers onto the screen in the language of our code, libraries, and architecture. Are you familiar with that experience?
What you suggest is precisely like a fluently bilingual person, who can already speak and think in beautiful, articulate, and idiomatic French, opting to converse to their Parisian friend through an English-French translation app instead of speaking to them directly.
When applied carefully, that technique might help someone who wants to learn French get more and deeper exposure than without a translation app, as they pay earnest attention to the input and output going through their app.
And that technique surely helps someone who never expects to learn French navigate their way through the conversations they need to have in a sufficient way, perhaps even opening new and eventful doors for them.
But it's an absolutely absurd technique for people whose fluency is such that there's no "translating" happening in the first place.
You can see that right?
>>Experienced developers develop fluency in their tools
>>You can see that right?
I get it, but this as big a paradigm shift as much as Google and Internet was to people in the 90s. Some times how you do things changes, and that paradigm becomes too big of a phenomenon to neglect. Thats where we are now.
You have to understand sometimes a trend or a phenomenon is so large that fighting it is pointless and some what resembles luddite behaviour. Not moving on with time is how you age out and get fired. Im not talking about a normal layoff, but more like becoming totally irrelevant to whatever that is happening in the industry at large.
That could totally be something that we encounter in the future, and perhaps we'll eventually be able to see some through line from here to there. Absolutely.
But in this thread, it sounds like you're trying to suggest we're already there or close to it, but when you get into the details, you (inadvertently?) admitted that we're still a long ways off.
The narrow-if-common examples you cited slow experienced people down rather than speed them up. They surely make some simple tasks more accessible to inexperienced people, just like in the translation app example, and there's value in that, but it represents a curious flux at the edges of the industry -- akin VBA or early PHP -- rather than a revolutionary one at the center.
It's impactful, but still quite far from a paradigm shift.
So you do in fact agree that the tool can only be trusted to do small chunks of basic action and can't be used to do anything heavy.
It writes blocks of 50 - 100 lines fairly fast in one go, and I work in such block chunks one at a time. And I keep going as I already have a good enough idea of how a program could look like. I can write some thing like a 10,000(100 such iterations) line Perl script in no time. Like in one/two working day(s).
Thats pretty fast.
If you are telling me I ask it to write the entire 20,000 line script in one go, that's not how I think. Or how I go about approaching anything in my life, let alone code.
To go a far distance, I go in cycles of small distances, and I go a lot of them.
I've spent a lot of time trying to make Aider and Continue do something useful, mainly with Qwen coder models. In my experience they suck. Maybe they can produce some scaffolding and boilerplate but I already have deterministic tools for that.
Recently I've tried to make them figure out an algorithm that can chug through a list of strings and collect certain ones, grouping lines with one pattern under the last one with another pattern in a new list. They consistently fail, and not in ways that are obvious at a glance. Fixing the code manually takes longer than just writing the code.
Usually it compiles and runs, but does the wrong thing. Sometimes they screw up recursion and only collect one string. Sometimes they add code for collecting the strings that are supposed to be grouped but don't use it, and as of yet it's consistently wrong.
They also insist on generating obtuse and wrong regex, sometimes mixing PCRE with some other scheme in the same expression, unless I make threats. I'm not sure how other people manage to synthesise code they feel good about, maybe that's something only the remote models can do that I for sure won't send my commercial projects to.
Then only use it for the small tasks? There's one button you have to click to turn it off.
I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
As someone old enough to have built websites in Notepad.exe it's totally reasonable that I ask my teams to turn off syntax highlighting, brace matching, and refactoring tools in VSCode. I didn't have them when I started, so they shouldn't use them today. Modern IDE features are just making them lazy.
/s
Yes. It is reasonable to ask them to turn off these things temporarily as an exercise, if you think it will help junior engineers write better code.
All teaching works this way. Would it be totally reasonable to have junior pilots only practice with autopilot?
I think your mistaking programmer productivity with A.I that generates all the code for you allowing you to switch off your brain completely. prompt engineering code is not the same skill as programming and being good at it does not mean you actually understand how code or software works.
Not all change is progress.
Change comes with pros and cons. The pros need to outweigh the cons (and probably significantly so) for change to be considered progress.
Syntax highlighting has the pro of making code faster to visually parse for most people at the expense of some CPU cycles and a 10 second setting change for people for whom color variations are problematic. It doesn't take anything away. It's purely additive.
AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.
My junior developers think I don't know they are using AI coding tools. I discovered it about 2 months into them doing it, and I've been tracking their productivity both before and after. In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time. Or at all. Even basic things have to be rewritten because they aren't suitable for purpose. And in our pair programming sessions, I see them frozen up now, where they weren't before they started using the tools. I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
I tried to use AI code generation once to fill in some ASP.NET Core boilerplate for setting up authentication. Should be basic stuff. Should be 3 or 4 lines of code. I've done it before, but I forgot the exact lines and had been told AI was good for this kind of lazy recall of common tasks. It gave me a stub that had a comment inside, "implement authentication here". Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation. And it still wasn't done. I haven't touched AI code gen since.
So IDK. I'm very skeptical of the claims that AI is writing significant amounts of working code for people, or that it at all rivals even a moderately smart junior developer (say nothing of actually experienced senior). I think what's really happening is that people are spending a lot of time spinning the roulette wheel, always betting on 00, and then crowing they're a genius when it finally lands.
AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.
At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.
I'm very skeptical of the claims that AI is writing significant amounts of working code for people
You may be right, but people write far too much code as it is. Software development should be about thinking more than typing. Maybe AI's most useful feature will be writing something that's slightly wrong in order to get devs to think about a good solution to their problem and then they can just fix the AI's code. If that results in better software then it's a huge win worth billions of dollars.
The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
> At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.
I am extremely skeptical that LLM-based generative AI running in silicon-based digital computers will improve to a significant degree over what we have today. Ever.
GPT-2 to GPT-3 was a sea change improvement, but every since then, new models are really only incrementally improving, despite taking exponentially more power and compute to train. Coupled with the fact that processor are only getting wider, not faster, or less energy consuming, then without an extreme change in computing technology, we aren't getting there with LLMs.
Either the processors or the underlying AI tech need to change, and there is no evidence this is the case.
> The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
I have no idea what you're even trying to say but this. Is this some kind of technoreligion that thinks AGI is worth the endeavor regardless of the harm that comes to people along the way?
I have no idea what you're even trying to say but this. Is this some kind of technoreligion that thinks AGI is worth the endeavor regardless of the harm that comes to people along the way?
I'm saying there is a lot of value in tools that are merely 'better than the status quo'. An AI assistant doesn't need to be as good as a dev in order to be useful.
Same. I've heard that this area is improving exponentially for years now, but I can't really say the results I'm getting are any better than I originally experienced with ChatGTP.
The dev tooling has gotten better; I use the integrated copilot every day and it saves me from writing a lot of boilerplate.
But it's not really replacing me as a coder. Yeah I can go further faster. I can fill in gaps in knowledge. Mostly, I don't have to spend hours on forums and stack overflow anymore trying to work out an issue. But it's not replacing me because I still have to make fine-grained decisions and corrections along the way.
To use an analogy, it's a car but not a self-driving one -- it augments my natural ability to super-human levels; but it's not autonomous, I still have to steer it quite a lot or else it'll run into oncoming traffic. So like a Tesla.
And like you I don't see how to get there from where we are. I think we're at a local maxima here.
To use an analogy, it's a car but not a self-driving one -- it augments my natural ability to super-human levels; but it's not autonomous, I still have to steer it quite a lot or else it'll run into oncoming traffic. So like a Tesla.
And like you I don't see how to get there from where we are. I think we're at a local maxima here.
To continue the car analogy - are you really suggesting we're at 'peak car'? You don't believe that cars in 20 years time are going to be significantly better than the cars we have today? That's very pessimistic.
There was a meme from back when - "if cars advanced like computers we'd be getting 1000 miles per gallon by now".
Thinking back to the car I had 20 years ago, it's not all that different from the car I have now.
Yes, the car I have now has a HUD, Carplay, Wireless iPhone charging, an iPhone app, adaptive cruise control, and can act as a wifi hotspot. But fundamentally it still does the same thing in the same way as 20 years ago. Even if we allow for EVs and Hybrid cars, it's still all mostly the same. Prius came out in 2000.
And now we've reached the point where computers advance like cars. We're writing code in the same languages, the same OS, the same instruction set, for the same chips as we did 20 years ago. Yes, we have new advancements like Rust, and new OSes like Android and iOS, and chipsets like ARM are big now. But iPhone/iPad/iMac, c/C++/Swift, OSX/MacOS/iOS, PowerPC/Intel/ARM.... fundamentally it's all homeomorphic - the same thing in different shapes. You take a programmer from the 70s and they will not be so out of place today. I feel like I'm channeling Bret Victor here: https://www.youtube.com/watch?v=gbHZNRda08o
And that's not for lack of advancements in languages, Os, instruction sets, and hardware architectures, it's for a lack of investment and commercialization. I can get infinite money right now to make another bullshit AI app, but no one wants to invest in an OS play. You'll hear 10000 excuses about how MS this and Linux that and it's not practical and impossible and there's no money in it, so on and so forth. The network effects are too high, and the in-group dynamic of keeping things the way they are is too strong.
But AGI? Now that's something investors find totally rational and logical and right around the corner. "I will make a fleet of robot taxis that can drive themselves with a camera" will get you access to unlimited wallets filled with endless cash. "I will advance operating systems past where they have been languishing for 40 years" is tumbleweeds.
The cars we have today are not so far off from the cars of 100 years ago. So yes, I highly doubt another 20 years of development, after all the low hanging fruit has already been picked, will see much change at all.
>>In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time.
Most people are using it to finish work soon, rather than use it to do more work. As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.
>>I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent. Things are only getting easier with time and have been like this for a few centuries. None of this is wrong.
>>Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation.
Honestly this largely reads like how my dad would describe technology from the 2000s. It was always that he was better off without it. Whether that was true or false is up for debate, but the world was moving on.
> As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.
I think you just hit the core point that splits people in these discussions.
For many senior engineers, we see our jobs are to build better and more lasting products. Correctness, robustness, maintainability, consistency, clarity, efficiency, extensibility, adaptability. We're trying to build things that best serve our users, outperform our competition, enable effective maintenance, and include the design foresight that lets our efforts turn on a dime when conditions change while maintaining all these other benefits.
I have never considered myself striving towards "newer and bigger" projects and I don't think any of the people I choose to work with would be able to say that either. What kind of goal is that? At best, that sounds like the prototyping effort of a confused startup that's desperately looking to catch a wave it might ride, and at worst it sounds like spam.
I assure you, little of the software that you appreciate in your life has been built by senior engineers with that vision. It might have had some people involved at some stage who pushed for it, because that sort of vision can effectively kick a struggling project out of a local minimum (albeit sometimes to worse places), but it's unlikely to have been a seasoned "senior engineer" being the one making that push and (if they were) they surely weren't wearing that particular hat in doing so.
I don't get this idea that to build a stable product you must make your life hard as much as possible.
One can use ai AND build stable products at the same time. These are not exactly opposing goals, and even above that assuming that ai will always generate bad code itself is wrong.
Very likely people will build both stable and large products using ai than ever before.
I understand and empathise with you, moving on is hard, especially when these kind of huge paradigm changing events arrive, especially when you are no longer in the upswing of life. But the arguments you are making are very similar to those made by boomers about desktops, internet and even mobile phones. People have argued endlessly how the old way was better, but things only get better with newer technology that automates more things than ever before.
I don't feel like you read my comment in context. It was quite specifically responding to the GP's point of pursuing "bigger and better" software, which just isn't something more senior engineers would claim to pursue.
I completely agree with you that "one can use ai AND build stable products at the same time", even in the context of the conversation we're having in the other reply chain.
But I think we greatly disagree about having encountered a "paradigm changing event" yet. As you can see throughout the comments here, many senior engineers recognize the tools we've seen so far for what they are, they've explored their capabilities, and they've come to understand where they fit into the work they do. And they're simply not compelling for many of us yet. They don't work for the problems we'd need them to work for yet, and are often found to be clumsy and anti-productive for the problems they can address.
It's cute and dramatic to talk about "moving on is hard" and "luddism" and some emotional reaction to a big scary immanent threat, but you're mostly talking to exceedingly practical and often very-lazy people who are always looking for tools to make their work more effective. Broadly, we're open to and even excited about tools that could be revolutionary and paradigm changing and many of us even spend our days trying to discover build those tools. A more accurate read of what they're saying in these conversations is that we're disappointed with these and in many cases and just find that they don't nearly deliver on their promise yet.
> life getting easier for the young isnt exactly something we must resent.
I don't see how AI is making life easier for my developers. You seem to have missed the point that I have seen no external sign of them being more productive. I don't care if they feel more productive. The end result is they aren't. But it does seem to be making life harder for them because they can't seem to think for themselves anymore.
> a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products
Well then, we're in agreement. I should reveal to my juniors that I know they are using AI and that they should stop immediately.
> I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent.
Of course not. But, eventually these young people are going to take over the systems that were built and understood by their ancestors. History shows what damage can be caused to a system when people who don't fully understand and appreciate how it was built take it over. We have to prepare them with the necessary knowledge to take over the future, which includes all the warts and shit piles.
I mean, we've spent a lot of our careers trying to dig ourselves out of these shit piles because they suck so bad, but we never got rid of them, we just hid them behind some polish. But it's all still there, and vibe coders aren't going to be equipped to deal with it.
Maybe the hope is that AI will reach god-like status and jut fix all of this for us magically one day, but that sounds like fixing social policy by waiting for the rapture, so we have to do something else to assure the future.
Is this supposed to be funny?
Funny / sad. GP is just highlighting the all too common attitude of people who grew up using new tech (graphing calculators, Wikipedia, etc) who reach a certain age and suddenly new tech is ruining the youth of today.
It’s just human nature, you can decide if it’s funny or sad or whatever
Neither of you have comprehended the part of my post where I talk about myself and my own skills.
Hiding behind the sarcasm tag to take the piss out of people younger than you, I don't think that's very funny. The magnetised needle and a steady hand gag from xkcd, now that is actually funny.
Yes.
>>but it falls down with anything heavy.
If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong.
>>I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
When I first started work, my employer didn't provide internet access to employees, their argument would always be how would you code if there was no internet connection, out there in the real world? , As it turns out they were not only worried about the wrong problem, but the got the whole paradigm about this new world wrong.
In short it was not worth building anything at all in a world internet doesn't exist.
>>then one day it's not cheap ...
Again you are worried about the wrong thing, your worry should not be what happens when its no longer cheaper, but when it, as a matter of fact gets cheaper. Which it will.
> If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong
Then what value are they actually adding?
If this is all they are capable of, surely you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
Why are people so bullish on them?
This is how I feel. I mentioned this to a couple of friends over a beer and their answer was that there are many not "decently competent programmer"s in the industry currently and they benefit immensely from this technology, at the expense of the stability and maintainability of the system they are working on.
English to Code translation.
Albeit they are fairly context aware as to what you are asking. So they can save a lot of RTFM and code/test cycles. At times they can look at the functions that are already built, and write new ones for you, if you can begin to describe the function well.
But if you want to write a good function, like written to fit tightly to specifications. Its too much English. You need to describe in steps what is to be done, plus exceptions. And at some point you are just doing logic programming(https://en.wikipedia.org/wiki/Logic_programming) In the sense that whole english text looks like a list of and/or situations + exceptions.
So you have to go one atomic step(a decision statement and a loop) at a time. But thats a big productivity boost too. Reason being able to put lots of text in place without you having to manually type it out.
>>you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
Honestly speaking most of coding is manually laborious if you don't know touch typing. And even if you did know its a chore.
I remember when I started using co-pilot with react it was doing a lot of otherwise typing work I'd have to do.
>>I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
IMO opinion, my brain atleast over the years has seen so many code patterns, debugging situations and what to anticipate and assemble as I go, that having some intelligent typing assistant is a major productivity boost.
>>Why are people so bullish on them?
Eventually newer programming languages will come along and people will build larger things.
Honestly, a lot of the problems people have with programming that they use AI to solve can be solved with better language design and dev tools.
For example, I like LLMs because they take care of a lot of the boilerplate I have to write.
But I only have to write that boilerplate because it's part of the language design. Advances in syntax and programming systems can yield similar speedups in programming ability. I've seen a 100x boost in productivity that came down to switching to a DSL versus C++.
Maybe we need more DSLs, better programming systems, better debugging tools, and we don't really need LLMs the way LLM makers are telling us? LLMs only seem so great because our computer architecture, languages and dev tooling and hardware are stuck in the past.
Instead of being happy with the Von Neumann architecture, we should be exploring highly parallel computer architectures.
Instead of being happy with imperative languages, we should be investing heavily in studying other programming systems and new paradigms.
Instead of being happy coding in a 1D text buffer, we should be investing more in completely imaginative ways of building programs in AR, VR, 3D, 2D.
LLMs are going to play a part here, but I think really they are a band-aid to a larger problem, which is that we've climbed too high in one particular direction (von-neuman/imperative/text) and we are at a local maxima. We've been there since 2019 maybe.
There are many other promising peaks to climb, avenues of research that were discovered in the 60s/70s/80s/90s have been left to atrophy the past 30 years as the people who were investigating those paths refocused or are now gone.
I think all these billions invested in AI are going to vaporize, and maybe then investors will focus back on the fundamentals.
LLMs are like the antenna at the top of the Empire State Building. Yes, you can keep going up if you climb up there, but it's unstable and eventually there really is a hard limit.
If we want to go higher that that, we need to build a wider and deeper foundation first.
Cursor's current business model produces a fundamental conflict between the well-being of the user and the financial well-being of the company. We're starting to see these cracks form as LLM providers are relying on scaling through inference-time compute.
Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
If you prune out context from the initial prompt, instead of reasoning on richer context, the llm reasons only on the prompt itself (w/ no access to the attached files). After the thinking process, Cursor runs function calls to retrieve more context, which entirely defeats the point of "thinking" and induces the model to create incoherent plans and speculative edits in its thinking process, thus explaining Claude's bizarre over-editing behavior. I suspect this is why so many Cursor users are complaining about Claude 3.7.
On top of this, Cursor has every incentive to keep the thinking effort for both o3-mini and Claude 3.7 to the very minimum so as to reduce server load.
Cursor is being hailed as one of the greatest SAAS growth stories but their $20/mo all-you-can-eat business model puts them in such a bad place.
>Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. While that seems like a perfectly reasonable strategy, it starts to fall apart when integrating reasoning models.
In general I feel like this was always the reason automatic context detection could not be good in fixed fee subscription models - providers need to constrain the context to stay profitable. I also saw that things like Claude Code happily chew through your codebase, and bank account, since they are charging by token - so they have the opposite incentive.
> This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
Keep in mind that what we call "reasoning" models today are the first iteration. There's no fundamental reason why you can't do what you stated. It's not done now, but it can be done.
There's nothing stoping you from running "tinking" in "chunks" of 1-2 paragraphs, doing some search, and adding more context (maybe from pre-reasoned cache) and continuing the reasoning from there.
There's also work being done on think - summarise - think - summarise - etc. And on various "RAG"-like thinking.
This is only surface-level deep. Cursor already has Quotas for their paid plans and Usage-based Pricing for their larger models, which I run into and fall over to their usage based model every month.
Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.
If you’re going to pay on the margin, why not use those incremental dollars running the same requests on cline? I’m assuming cost is the deciding factor here because, quality-wise, plugging directly into provider apis with cline always does a much better job for me.
> Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
There's nothing about this that conflicts with reasoning models, I'm not sure what you mean here.
what i mean is that their implementation (thinking only on the first response) renders zero benefit because it doesn’t see the code itself. They run multiple function calls to analyze your codebase in increments. If they ran the thinking model on the output of those function calls, then performance would be great but, so far, this is not what they are doing (yet). It also dramatically increases the cost of running the same operation.
But the way those models work is to run everything once the function calls come in. Are you saying cursor is not using the model you selected on function calls responses?
This sounds like a Cursor issue, not something that effects reasoning models in general.
edit: Ah, I see what you mean now.
That's my point. Cursor, by offering unlimited requests (500 fast requests + unlimited slow requests) to people paying a fixed $20/mo, they've put themselves into a ruthless marginal cost optimization game where one of their biggest levers for success is reducing context sizes and discouraging thinking after every function call.
Software like Claude Code and Cline do not face those constraints, as the cost burden is on the user.
Reflecting on your comment I realized that using a huge amount of GPUs is akin to an Turing machine approaching infinite speed. So I think the promise of LLMs writing code is basically saying: if we add a huge number of reading/writing heads with unbounded number of rules, we can solve decideability. Because what is the ability to generate arbitrarily complex code if not solving the halting problem? Maybe there's a more elegant or logical way to postulate this, or maybe I'm just confused or plain wrong, but it seems to me that it is impossible to generate a program that is guaranteed to terminate unless you can solve decideability. And throwing GPUs at a huge tape is just saying that the tape approaches infinite size and the Turing machine approaches infinite speed...
Or put another way, isn't the promise of software that is capable to generate any software given a natural language description in finite time basically assuming P=NP? Because unless the time can be guaranteed to be finite, throwing GPU farms and memory at this most general problem (isn't the promise of using software to generating arbitrary software the same as the promise that any possible problem can be solved in polynomial time?) is not guaranteed to solve it in finite time.
> Cursor has been trying to do things to reduce the costs of inference, especially through context pruning.
You can also use cline with gemini-2.0-flash, which supports a huge context window. Cline will send it the full context and not prune via RAG, which helps.
I love cline but i’ve never tried the gemini models with it. I’ll give it a shot tonight, thanks for the tip!
I had been using Cursor for a month until a day when my house got no internet, then i realized that i started forgetting how to write code properly
I had the exact same experience, pretty sure this happens in most cases, people just don’t realize it
Just get a Mac Studio with 512GB RAM and run a local model when the internet is down.
Which local model would you recommend that comes close to cursor in response quality? I have tried deepseek, mistral, and few others. None comes close to quality of cursor. I keep coming back to it.
Possibly useful comment on local models, perhaps also fitting on machines with less ram:
https://news.ycombinator.com/item?id=43340989
It's a backup plan, who cares if the quality matches? If it did, Cursor would not be in question.
A $10k backup plan? That makes sense. No wonder you used a throwaway.
[flagged]
The hardware isn't free. Someone asks a question and your answer is who cares about $10k of hardware hanging around as a subpar backup?
See, the thing is, I never wanted to comment on the cost or feasibility of the hardware at all. What I was commenting was that any backup plan is expected to be subpar by very nature, and if not, shouldnbe instantly promoted. If you'll notice that was 100% of what I said. I was adding to the pile of "this plan is stupid". Cursor has an actual value proposition.
Of course then you disrespected me with a rude ad hominem and got a rude response back. Ignoring the point and attacking the persin is a concession. M
For the record, I and many others use throwaways wvery single thread. This isn't and shouldn't be reddit.
You're right, I shouldn't have said the throwaway bit, sorry. However, you're ignoring the context of the conversation, which is a $10k piece of hardware. I don't know what you expected to add to the conversation by saying "who cares?" when someone asks for advice, in context or even in isolation.
wrong. where the user is asking for recommended models (for offline use), they’re not saying “yes in fact I will burn $10000 on a computer”, not at all lol
Can one run cursor with local LLMs only?
Back up your $20 a month subscription with a $2000 Mac Studio for those days your internet is down.
Peak HN.
Lol he suggested a $10k Mac Studio
But you can at least resell that $10k Mac Studio, theoretically.
Trying to do that with an M1 laptop of 32 GB and it's hard to get even 1000 euro's for it in the Netherlands whereas the refurbished price is at double of that.
Even more absurd is that Mac Studio with 512GB RAM costs around $9.5K
> Peak HN.
But, alas, not a single upvote.
Maybe this "backup" solution.. developed into commodity hardware as an affordable open source solution that keeps the model and code locally and private at all times is the actual solution we need.
Lets say a cluster of raspberry pi's / low powered devices producing results as good as claude 3.7-sonnet. Would it be completely infeasible to create a custom model that is trained on your own code base and might not be a fully fledged LLM but provides similar features to cursor?
Have we all gone bonkers sending our code to third parties? The code is the thing you want to keep secret unless your working on an open source project.
2000$? You wish!
Lol, not sure where I got the 2k from. Brain fart, but I'll let it stand :D
But then I’d be using a Mac, and that would slow my development down and be generally miserable.
lol
The UX of tools like these is largely constrained by how good they are with constructing a complete context of what you are trying to do. Micromanaging context can be frustrating.
I played with aider a few days ago. Pretty frustrating experience. It kept telling me to "add files" that are in the damn directory that I opened it in. "Add them yourself" was my response. Didn't work; it couldn't do it somehow. Probably once you dial that in, it starts working better. But I had a rough time with it creating commits with broken code, not picking up manual file changes, etc. It all felt a bit flaky and brittle. Half the problem seems to be simple cache coherence issues and me having to tell it things that it should be figuring out by itself.
The model quality seems less important than the plumbing to get the full context to the AI. And since large context windows are expensive, a lot of these tools are cutting corners all the time.
I think that's a short term problem. Not cutting those corners is valuable enough that a logical end state is tools that don't do that that cost a bit more. Just load the whole project. Yes it will make every question cost 2-3$ or something like that. That's expensive now but if it drops by 20x we won't care.
Basically large models that support huge context windows of millions/tens of millions of tokens cost something like the price of a small car and use a lot of energy. That's OK. Lots of people own small cars. Because they are kind of useful. AIs that have a complete, detailed context of all your code, requirements, intentions, etc. will be able to do a much better job that one that has to guess all of that from a few lines of text. That would be useful. And valuable to a lot of people.
Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
> aider [...] It kept telling me to "add files" that are in the damn directory that I opened it in.
That's intentional, and I like it. It limits the context dynamically to what is necessary (of course it makes mistakes). You can also add files with placeholders and in a number of other ways. but most of the time I let Aider decide. It has a repomap (https://aider.chat/docs/repomap.html), gradually building up knowledge and makes proposals based on this and other information it gathered also with token costs and out-of-context-window in mind.
As for manual changes: aider is opinionated regarding the role of Git in your workflow. At first glance, this repels some people and some stick to this opinion. For others, it is exactly one of the advantages, especially in combination with the shell-like nature of the tool. But the standard Git handling can still be overridden. For me personally, the default behavior becomes more and more smooth and second nature. And the whole thing is scriptable, I only begin to use the possibilities.
In general: Tools have to be learned, impatient one-shot attempts are simply not enough anymore.
> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
OTOH currently the LLM companies are probably taking a financial loss with each token. Wouldn't be surprised if the price doesn't even cover the electricity used in some cases.
Also e.g. Gemini already runs on Google's custom hardware, skipping the Nvidia tax.
> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
That still leaves us with an ungodly amount of resources used both to build the GPUs and to run them for a few years before having to replace them with even more GPUs.
Its pretty amazing to me how quickly the big tech companies pivoted from making promises to "go green" to buying as many GPUs as possible to burn through entire powerplants worth of electricity.
Try Claude Code. It figures out context by itself. I’m having a lot of success with it for a few days now, whereas I never caught on with Cursor due to the context problem.
I made a quick prototype to demonstrate what I think A.I code assistance should be..
https://github.com/hibernatus-hacker/ai-hedgehog
This is a simple code assistant that doesn't get in your way and makes sure you are coding (not losing your ability to program).
You configure a replicate API token from replicate... install the tool and point it at your code base.
When you save a file it asks the LLM for advise and feedback on the file as a "senior developer".
Run this along side your favorite editor to get feedback from an LLM as your working on (open source code nothing you don't want third parties to see).
You are still programming and using your brain but you have some feedback when you save files.
The feedback is less computationally expensive or fraught with difficulty than actually getting code from LLM's so it should work with much less powerful models.
It would be nice if there was a search built in so it could search for useful documentation for you.
I've found tools like Cursor useful for prototyping and MVP development. However, as the codebase grows, they struggle. It's likely due to larger files or an increased number of them filling up the context window, leading to coherence issues. What once gave you a speed boost now starts to work against you. In such cases, manually selecting relevant files or snippets from them yields better results, but at that point it's not much different from using the web interface to something like Claude.
I had that same experience with Claude Code. I tried to do a 95% "Idle Development RPG" approach to developing a music release organization software. At the beginning, I was really impressed, but with more and more complexity, it becomes increasingly incoherent, forgetting about approaches and patterns used elsewhere and reinventing the wheel, often badly.
Or the context not being large enough for all the obscure functions and files to go into the context. I am too basic to have dug deep enough, but a simple (automatic) documentation context for the entire project would certainly improve things for me.
Agreed. One useful tip is to have Cursor break up large files into smaller files. For some reason, the model doesn't do this naturally. I've had several Cursor experiments grow into 3000+ line files because it just keeps adding.
Once the codebase is reasonably structured, it's much better at picking which files it needs to read in.
I've tried Cursor a couple of times but my complain is always the same: why forking VS Code when all this functionality could just be an extension, same as Copilot does?
Some VSCode extensions don't work, you need to redo all your configuration, add all your workspaces... and the gain vs Copilot is not that high
> why forking VS Code when all this functionality could just be an extension, same as Copilot does?
Have you programmed extensions for VSCode before? While it seems like a fairy extensible system overall, the editor component in particular is very restrictive. You can add text (that's what extensions like ErrorLens and GitLens are doing), inlay hints, and on-hover popup overlays (those can only trigger on words, and not on punctuation). What Cursor does: the automatic diff-like views of AI suggestions with graphic outlines, floating buttons, and whatnot right on top of the text editing view - is not possible in vanilla VSCode.
This was originally driven by necessity of tighter control over editor performance. In its early days VSCode was competing with Atom - another extensible JS-powered editor from GitHub, and while Atom had an early lead due to larger extensions catalog VSCode ultimately won the race because they manged to maintain lower latency of their text editor component. Nowadays they still don't want to introduce extra extension points to it, because newer faster editors pop out all the time, too.
> and the gain vs Copilot is not that high
I think that's (at least part of) your answer. More friction to move back from an entirely separate app rather than disabling an extension.
I read this point in the article with bafflement:
"Learn when a problem is best solved manually."
Sure, but how? This is like the vacuous advice for investors: buy low and sell high
By trying things and seeing what it’s good and bad at. For example, I no longer let it make data modelling decisions (both for client local data and database schemas), because it had a habit of coding itself into holes it had trouble getting back out of, eg duplicating data that it then has difficulty keeping in sync, where a better model from the start might have been a more normalised structure.
But I came to this conclusion by first letting it try to do everything and observing where it fell down.
How can I stop Cursor from sending .env files with secrets as plain text? Nothing I tried from the docs works.
This is a huge issue that was already raised on their forums and it's very surprising they didn't address it yet.
[0] https://forum.cursor.com/t/environment-secrets-and-code-secu...
I have been adding .env files to .cursorignore so far.
I can see from that thread that the approach hasn’t been perfect, but it seems that the last two releases have tried to address that :
“0.46.x : .cursorignore now blocks files from being added in chat or sent up for tab completions, in addition to ignoring them from indexing.”
> And then at the top of the file, just write some text about what the project is about. If you have a particular file structure and way of organising code that is great to put in as well.
By asking the AI to generate a context.md file, you get an automatically structured overview of the project, including its purpose, file organization, and key components. This makes it easier to onboard new contributors, including other LLMs.
Compounding the opinions of other commentors, I feel that using Cursor is a bad idea. It's a closed source SaaS, and with these components involved, service quality can do wild swings on a daily basis, not something I'm particularly keen of.
This is true of every single service provider outside of fully OSS solutions, which are a teeny tiny fraction of the world's service providers.
There's always Aider with local models!
For those of you who, like me, use Neovim, you can achieve "cursor at home" by using a plugin like Avante.nvim or CodeCompanion. You can configure it to suit your preferences.
Just sharing this because I think some might find it useful.
Other useful things I've discovered:
- Push for DRY principles ("make code concise," "ensure good design").
- Swap models strategically; sometimes it's beneficial to design with one model and implement with another. For example, use DeepSeek R1 for planning and Claude 3.5 (or 3.7) for execution. GPT-4.5 excels at solving complex problems that other models struggle with, but it's expensive. - Insist on proper typing; clear, well-typed code improves autocompletion and static analysis.
- Certain models, particularly Claude 3.7, overly favor nested conditionals and defensive programming. They frequently introduce nullable arguments or union types unnecessarily. To mitigate this, keep function signatures as simple and clean as possible, and validate inputs once at the entry point rather than repeatedly in deeper layers.
- Emphasize proper exception handling. Some models (again, notably Claude 3.7) have a habit of wrapping everything in extensive try/catch blocks, resulting in nested and hard-to-debug code reminiscent of legacy JavaScript, where undefined values silently pass through multiple abstraction layers. Allowing code to fail explicitly is a blessing for debugging purposes; masking errors is like replacing a fuse with a nail.
Some additional thoughts on GPT-4.5: it provides BFK-9k experience - eats e̶n̶e̶r̶g̶y̶ ̶c̶e̶l̶l̶s̶ budget ($2 per call!) like there is no tomorrow, but removes bugs with a blast.
In my experience, the gap between Claude 3.7 and GPT-4.5 is substantial. Claude 3.7 behaves like an overzealous intern on stimulants. It delivers results but often includes unwanted code changes, resulting in spaghetti code with deeply nested conditionals and redundant null checks. Although initial results might appear functional, the resulting technical debt makes subsequent modifications increasingly difficult, often leaving the codebase in disarray. GPT-4.5 behaves more like a mid-level developer, thoughtfully applying good programming patterns.
Unfortunately, the cost difference is significant. For practical purposes, I typically combine models. GPT-4.5 is generally reserved for planning, complex bug fixes, and code refinement or refactoring.
In my experience, GPT-4.5 consistently outperforms thinking models like o1. Occasionally, I'll use o3-mini or DeepSeek R1, but GPT-4.5 tends to be noticeably superior (at least, on average). Of course, effectiveness depends heavily on prompts and specific problems. GPT-4.5 often possesses direct knowledge about particular libraries (even without web searching), whereas o3-mini frequently struggles without additional context.
Wouldn’t it be easier instead of juggling with models and their quirks to just write the code the old way?
Depends.
Sometimes I could solve in 15 mins, a bug I had been chasing for days. In other cases, it is simpler to write codes by hand - as AI either does not solve a problem (even a simple one), or does, but at a cost of tech debt - or it takes longer than doing things manually.
AI is just one more tool in our arsenal. It is up to us to decide when to use them. Just because we have a hammer does not mean we need to use it for screws.
> Wouldn’t it be easier instead of juggling with [something] and their quirks to just write the code the old way?
This phrase, when taken religiously, would keep us writing purely in assembly - as there is always "why this new language", "why this framework", "why LLMs".
I have been a religious Cursor + Sonnet user for like past half a year, and maybe I'm an idiot, but I don't like this agentic workflow at all.
What worked for me is having it generate functions, classes, ranging from tens of lines of code to low hundreds. That way I could quickly interate on its output and check if its actually what I wanted.
It created a prompt-check-prompt iterative workflow where I could make progress quite fast and be reasonably certain of getting what I wanted. Sometimes it required fiddling with manually including files in the context, but that was a sacrifice I was willing to make and if I messed up, I could quickly try again.
With these agentic workflows, and thinking models I'm at a loss.
To take advantage of them, you need very long and detailed prompts, they take a long time to generate and drop huge chunks of code on your head. What it generates is usually wrong due to the combination of sloppy or ambiguous requirements by me, model weaknesses, and agent issues. So I need to take a good chunk of time to actually understand what it made, and fix it.
The iteration time is longer, I have less control over what it's doing, which means I spend many minutes of crafting elaborate prompts, reading the convoluted and large output, figuring out what's wrong with it, either fixing it by hand, or modifying my prompt, rinse and repeat.
TLDR: Agents and reasoning models generate 10x as much code, that you have to spend 10x time reviewing and 10x as much time crafting a good prompt.
In theory it would come out as a wash, in practice, it's worse since the super-productive tight AI iteration cycle is gone.
Overall I haven't found these thinking models to be that good for coding, other than the initial project setup and scaffolding.
I think you’re absolutely right and I’ve come to the same conclusion and workflow.
I work on one file at a time in Ask mode, not Composer/Agent. Review every change, and insist on revisions for anything that seems off. Stay in control of the process, and write manually whenever it would be quicker. I won’t accept code I don’t understand, so when exploring new domains I’ll go back with as many questions as necessary to get into the details.
I think Cursor started off this way as a productivity tool for developers, but a lot of Composer/Agent features were added along the way as it became very popular with Vibe Coders. There are inherent risks with non-coders copypasting a load of code they don’t understand, so I see this use case as okay for disposable software, or perhaps UI concept prototypes. But for things that matter and need to be maintained, I think your approach is spot on.
Have you found that this still saves you time overall? Or do you spent a similar amount of time acting as a code reviewer rather than coding it yourself?
Yes, I think so. Often it doesn’t take much more than a glance for simpler edits.
Do you have any Cursor rules defined? Those tend to control its habit of trying to go off the rails and solve 42 problems at once instead of just the one.
Do any of these tools use the rich information from the AST to pull in context? Coupled with semantic search for entry points into the AST, it feels like you could do a lot…
Don’t they all do this? Surely they’re not just doing naive text, n-gram, regex, embeddings, etc, right?
AI blows me away when asked to write greenfield code. It can get a complex task using hundreds of lines of code right on the first try or perhaps it needs a second try on the prompt and an additional tweak of the output code.
As things move from prototype to production ready the productivity starts to become a wash for me.
AI doesn’t do a good job organizing the code and keeping it DRY. Then it’s not easy for it to make those refactorings later. AI is good at writing code that isn’t inherently defective but if there is complexity in the code it will introduce bugs in its changes.
I use Continue for small additions and tab completions and Claude for large changes. The tab completions are a small productivity boost.
Nice to see these tips- I will start experimenting with prompts to produce better code.
parts of the article are spot on. after the magic has worn off i find it's best to literally treat it like another person. would you blindly merge code from someone else or huge swaths of features? no. i have to review every single piece of code, because later on when there's a bug or new feature you have to have that understanding.
another huge thing for me has been to scaffold a complex feature just to see what it would do. just start out with literal garbage and an idea and as long as it works you can start to see if something is going to pan out or not. then tear it down and do it again with those new assumptions you learned. keep doing it until you have a clear direction.
or sometimes my brain just needs to take a break and i'll work on boilerplate stuff that i've been meaning to do or small refactors.
How does the current state of Cursor agentic workflow compare to Windsurf Editor?
I've been using Windsurf since it was released, and back then, it was so ahead of Cursor it's not even funny. Windsurf feels like it's trained on good programming practices (check usage of the function in other parts of the project for consistency, double checking for errors after changes made, etc). It's also surprisingly fast (it can "search" the 5k files codebase in, like, 2 seconds. It even asked me once to copy and paste output from Chrome DevTools because it suspected that my interpretation of the result was not accurate (and it was right).
The only thing I truly wish is to have the same experience with locally running models. Perhaps Mac Studio 512GB will deliver :)
I too liked windsurf better than cursor until ...
I asked it to refactor an authenticatedfetch block of code. It went on a loop exhausting 15 credits (https://bsky.app/profile/jjude.com/post/3ljuhrxs3442k).
To be honest, I switched from Cursor to Windsurf precisely because of how much less of a credits it uses. Even using daily, I couldn't even remotely hit the limits of the credits in Windsurf. Well, initially they didn't even show how many credits I'm using :), now it's more visible, but still for 10$ per month I still can't hit the limits and I'm not restricting myself (not abusing either).
How much does a credit cost in USD?
It's a fixed price of 10$ per month with 500/1500 credits for "premium models" (claude/etc), and unlimited for their own base model.
I saw this post on the first page a few minutes ago (published 5 hours ago), but it quickly dropped to the 5th page. Given its comments and points, that seems odd. I had to search to find it again. Any idea why?
Note that the latest update (0.47.x) made this useful change:
Rules: Allow nested .cursor/rules directories and improved UX to make it clearer when rules are being applied.
This has made things a lot easier in my monorepos.
What programming languages do you primarily use ? I feel that knowing what programming languages a llm is best at is valuable but often not directly apparent.
Is there an equivalent to cursorrules and copilot-instructions for the Jetbrains IDEs (Rider) + GitHub Copilot extension?
> Like mine will keep forgetting about nullish coallescing (??) in JS, and even after I fix it up it will revert my change in its future changes. So of course I put that rule in and it won't happen again.
I'm surprised that this sort of pattern - you fix a bug and the AI undoes your fix - is common enough for the author to call it out. I would have assumed the model wouldn't be aggressively editing existing working code like that.
Yeah I have seen this a bunch of times as well. Especially with deprecated function calls. It generates a bunch of code. I get deprecation warnings. I fix them. Copilot fixes them back. I have to explicitly point out that I made the change for it to not reintroduce the deprecations.
I guess that while code that compiles is easier to train for but code with warnings less so?
I remember there are other examples of changes that I have to tell the AI I made to not have it change it back again, but can't remember any specific examples.
It’s due to a problem with Cursor not updating the state of the files that have been manually edited since the last time they were used in the chat, so it’ll thing the fix is not there and blindly output code that doesn’t have it. The ‘apply’ model is dumb, so it just overwrites the corrected version with the wrong one.
I think the changelog said they fixed it in 0.46, but that’s clearly not the case.
Yep I asked about this exact problem the other day: https://news.ycombinator.com/item?id=43308153 Having something like “always read the current version of the file before suggesting edits” in Cursor rules doesn’t help, the current file is only read by the agent sometimes. Guess no one has a reliable solution yet.
Cursor in agent mode + Sonnet 3.7 love nothing better than rewriting half your codebase to fix one small bug in a component.
I've stopped using agent unless its for a POC where I just want to test an assumption. Applying each step takes a bit more time but means less rogue behaviour and better long term results IME.
Sounds like a human colleague of mine
> love nothing better than rewriting half your codebase to fix one small bug in a component
Relatable though.
Reminds me of my old co-worker who rewrote our code to be 10x faster but 100x more unreadable. AI agent code is often the worst of both of those worlds. I'm going to give [0] this guy's strategy a shot.
[0] https://www.youtube.com/watch?v=jEhvwYkI-og
If you stopped using agent mode, why use Cursor at all and not a simple plugin for VSCode? Or is there something else that Cursor can do, but a VSCode plugin can't?
I'm sorry, but isn't Cursor just an editor? Maybe an editor shouldn't actually have garbage parts to avoid?
Why not just use an editor that is focused on coding, and then just not use an LLM at all? Less fighting the tooling, more getting your job done with less long term landmines.
There are a lot of editors, and many of them even have native or semi-native LLM support now. Pick one.
Edit: Also, side note, why are so many people running their LLMs in the cloud? All the cutting edge models are open weight licensed, and run locally. You don't need to depend on some corporation that will inevitably rug-pull you.
Like, a 7900XTX runs you about $1000. You probably already own a GPU that cost more in your gaming rig.
> Edit: Also, side note, why are so many people running their LLMs in the cloud? All the cutting edge models are open weight licensed, and run locally. You don't need to depend on some corporation that will inevitably rug-pull you.
???
Deepseek R1 doesn't run locally unless you program on a dual socket server with 1 TB of RAM. Or enough cash to have a cabinet of GPUs. The trend for state-of-the-art LLMs is to get bigger over time, not smaller.
Look, I've played with llava and llama locally too, but the benchmarked performance is nowhere near what you can get from the larger cloud providers who can serve hundred-million+ parameter models without quantization.
You wouldn't use full fledged R1 for coding. There are distilled models using R1 for coding that get you most of the way there. R1 also doesn't take 1TB of RAM, go use read Unsloth's writeup on how to reduce model size without reducing quality (they got it to fit into 131GB): https://unsloth.ai/blog/deepseekr1-dynamic tl;dr parameter count is where the statistical model lives or dies, not weight precision; you can't blindly shrink every weight, and tooling is learning how to not butcher models.
Also, performance between cloud-ran models and models I've ran locally with llama.cpp seem to be actually pretty similar. Are you sure your model didn't fit into your VRAM, or something else may have been misconfigured? Not fitting into VRAM slows everything to a halt. All the coder models that are worth looking at fit into 24GB cards in their full sized variants with the right quantization.
> All the cutting edge models are open weight licensed, and run locally.
No? from https://lmarena.ai/ coding:
...
Rank* ... Model ... Score ... Org ... License
1 ... Grok-3-Preview-02-24 ... 1414 ... xAI ... Proprietary
1 ... GPT-4.5-Preview ... 1413 ... OpenAI ... Proprietary
3 ... Gemini-2.0-Pro-Exp-02-05 ... 1378 ... Google ... Proprietary
3 ... o3-mini-high ... 1369 ... OpenAI ... Proprietary
3 ... DeepSeek-R1 ... 1369 ... DeepSeek ... MIT
3 ... ChatGPT-4o-latest (2025-01-29) ... 1367 ... OpenAI ... Proprietary
3 ... Gemini-2.0-Flash-Thinking-Exp ... 1366 ... Google ... Proprietary
3 ... o1-2024-12-17 ... 1359 ... OpenAI ... Proprietary
3 ... o3-mini ... 1353 ... OpenAI ... Proprietary
4 ... o1-preview ... 1355 ... OpenAI ... Proprietary
4 ... Gemini-2.0-Flash-001 ... 1354 ... Google ... Proprietary
4 ... o1-mini ... 1353 ... OpenAI ... Proprietary
4 ... Claude 3.7 Sonnet ... 1350 ... Anthropic ... Proprietary
Yes, I'm aware of various rankings. Try all of those models on something that isn't commonly used on a benchmark, and you'll notice that a lot of the proprietary models have trouble actually producing statistically relevant results.
The only one that I've come across that makes me think LLMs will maybe be useful someday is Deepseek R1 and the redistillations based on it.
I've seen HN's fascination with OpenAI's products, and I can't understand why. Even O1 and O3, they're always too little too late, somebody else already is doing something better and throwing it into a HF repo. Must be the Silicon Valley RDF at work.
Cursor overwrites the “code” command line shortcut/alias that’s normally set by VS Code. It does this on every update with no setting to disable this behavior. There are numbers of forum threads asking about manual solutions. This seems like a deliberately anti-user feature meant to get their usage numbers up at all costs. This small thing makes me not trust the decision making process at Cursor won’t sell me out as a user.
This is the primary reason I uninstalled Cursor and subsequently realized that, hey, VS Code has most of these features now.
What in the hell were they thinking?!
So, if I liked being a manager more than a developer, I'd use Cursor, and lean in entirely on AI?
Yes; it can be used in agentic mode and along with the joys it also has a few of the frustrations that would be familiar if have managed human devs.
If you don’t understand what it outputs then it’s just random garbage.
The new Cursor update (0.47) is cursed. They got rid of codebase searching (WTF?) and the agent is noticeably worse, even when using Sonnet 3.5.
I'm really shocked, actually. This might push me to look at competitors.
I tried cursor for a day or two and then asked for a refund... here's why:
* It has terrible support for Elixir (my fav language) because the models are only really trained on python.
* Terrible clunky interface... it would be nice if you didn't have to click around, do modifier ctrl + Y stuff ALL the time.
* The code generated is still riddled with errors or naff (apart from boiler plate)... so I am still * prompt engineering * the crap out of it.. which I'm good at but I can prompt engineer using phind.com...
* The fact that the code is largely broken first time and they still haven't really fixed the context window problem means you have to copy paste error codes back into it.. defeating the purpose of an in integrated IDE imo.
* The free demo mode stops working after generating one function... if I had been given more time to evaluate it fully I would never have signed up. I signed up to see if it was any good.. which it isn't.
Too bad they removed the ability to use Chat (rebranded as Ask) with your own API keys in version 0.47. Now every feature requires a subscription.
Natural for Cursor to nudge users towards their paid plans, but why provide the ability to use your own API keys in the first place if you're going to make them useless later?
nice also you can use project-specific structure and markdown files to ensure the AI organizes content correctly for your use case. we are using it on 800k lines of golang and it works well. https://getstream.io/blog/cursor-ai-large-projects/
Will you be able to share those ai_*.md files ?
Very useful!!!
Cline is much better
[dead]
Just use Cline, it beats Cursor hollow — saves me like hours per day