Skip to main content

2 AI or not 2 AI... Is really not the question

There's been a new bit of drama in Node.js core.

This really shouldn't come as a big surprise, there's always at least some bit of drama in Node.js core.

This time, the drama is all about use of AI agents to generate code for Node.js core. Recently, Node.js TSC Chair and Platformatic Co-founder Matteo Collina opened a pull request to add a new 18k LOC Virtual File System module. The code was written with the assistance of an AI agent, which Matteo readily acknowledged in the pull request description.

The PR was opened on January 22 of this year. Since then there's been a ton of deep technical review and discussion about the code, the design, etc and it currently has enough signoffs and no blockers to get merged. It's certainly not perfect and there's still lots of work to be done on it, but the PR as it stands at the moment is ready to go.

So what's the drama? Former Node.js TSC member and core contributor Fedor Indutny raised the question, "Does this PR adhere to Developer's Certificate of Origin 1.1 described in CONTRIBUTING.md?".

It's a good question worth asking. tl;dr The answer is yes, it does. But let's go through each of Fedor's points.

First let's look at the DCO. Here's the full text:

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

(b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or

(c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.

(d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

Fedor first calls out point (a) and says that it "[d]oes not seem to apply" on the grounds that the code was not created in whole by Matteo. Let's put aside the fact that Matteo did directly contribute to the code which satisfies the "in part" language there; and instead focus on the "right to submit it" part.

Consider a hypothetical scenario where, instead of using an AI agent to write the code, Matteo hired a contractor or one of his employees at Platformatic to write the code for him. In that scenario, Matteo would still be within his rights to submit the code under the DCO. It has long been established that paying a contractor to write code for you does not disqualify you from claiming ownership/copyright of that code. The same is true for tool-generated code. No one would question if I used a template engine to generate the code for a web page, I can still claim ownership of the resulting code. AI is just another code generation tool, albeit a much more powerful one.

But what about the fact that AI models are trained on public code, including code from projects with incompatible licenses? Or maybe even, gasp, copyrighted material like books, articles, etc, without the author's permission? That is a common and valid concern, but it doesn't change the calculus here. The DCO does not require that the contributor be the sole author of the code, it only requires that the contributor have the right to submit it and that attestation makes them personally legally responsible for that claim. That is, it is Matteo's responsibility to ensure that the code being submitted does not violate any licenses or copyrights.

Let's go back to the hypothetical contractor scenario. If the contractor had plagiarized code from an incompatible license and included it in the code they wrote for Matteo, and Matteo then submitted that code under the DCO, Matteo would be the one legally responsible for that violation, not the contractor. AI does not change that dynamic. If the AI model generated code that included plagiarized material, and Matteo submitted that code under the DCO, Matteo would be the one legally responsible for that violation, not the AI model.

So to answer Fedor's first point, yes, point (a) does apply. If someone can show any evidence that the code in the PR includes material that violates the IP policy of the project or any other source of material that Matteo does not have the right to submit, then that's a reason to block the PR. Nothing of the sort has been shown. IP concerns are valid with or without the use of AI.

Fedor's says that the second point, (b), "[a]lso doesn't apply because this is based on previous work, but no one can assert knowledge of the licensing terms of the code it is based on". However, Fedor has not shown any evidence that the code in question is "based on previous work" in the sense that it could be considered a "derivative". Is the possibility there? Yes, but that's true of any code written by a human as well. If a human contributor wrote code that was based on incompatibly licensed code, the concern is exactly the same.

Fedor's also says that point (c) doesn't apply and here we agree. The code was not provided by a third person, it was generated by Matteo who was using an AI agent as a tool. So point (c) does not apply, but that doesn't mean the DCO is not satisfied.

As a follow up, Fedor has opened a petition to try to convince the TSC to not allow "slop" in Node.js. Specifically:

We, the undersigned, petition the Node.js Technical Steering Committee (TSC) to vote NO on "Is AI-assisted development allowed?" and not accept LLM assisted rewrites of core internals.

Node.js is a critical infrastructure running on millions of servers online and supporting engineers through command-line utilities that they use daily. We believe that diluting the core hand-written with care and diligence over the years is against the mission and values of the project and should not be allowed. Accepting LLM changes to Node.js core would break the reputational bedrock of public contributions that have brought Node.js to its current public standing and societal value.

Submitted generated code should be reproducible by reviewers without having to go through the paywall of subscription based LLM tooling.

Let's put aside the fact that the last sentence there is a non-sequitur for the rest of the petition (if the ask is that no AI-assisted development be allowed, then is makes no difference whether the tooling is subscription based or not).

This petition is flawed for two very fundamental reasons:

  1. The code is not "slop". The term "slop" for AI-generated code is a pejorative used to imply that that code is of low quality, hasn't been carefully reviewed, does not follow the same standards as the human-written alternative, etc. The code in this PR has been thoroughly reviewed by multiple core contributors. It still has bugs, yes, but that is true of lots of code, especially brand new, highly complex experimental submissions like this one. Matteo didn't just "vibe code" 18k lines of code and submit it without careful review and consideration. He followed a very careful man-in-the-loop process and the resulting PR has gone through many rounds of review, discussion, and revision. The foundation of Fedor's petition is a false premise: that all AI-assisted code is "slop". That's just not true, and in this case, it's demonstrably not true.

  2. The second issue with the position the petition takes is that it is simply unenforceable. If the TSC were to vote "NO" on "Is AI-assisted development allowed?", what would that actually mean? Would it mean that if a PR was opened with code that was generated with the assistance of an AI agent? First, how could we tell the difference? AI-assisted code does currently often include a number of tell-tale signposts such as overly verbose comments, cliche patterns, etc, but as AI models continue to improve, those signposts will become less and less reliable. Also, a contributor could easily just modify the generated code to remove those indicators. The TSC could demand that contributors disclose whether they used AI assistance but the contributor could easily just lie about it. Developers who are using AI assistance ethically and responsibly have no incentive to lie (which is why Matteo was transparent about it in the PR description), but if the TSC were to take a hardline stance against AI-assisted development, then that would create an incentive for contributors to simply ignore the rule.

It's all well and good to have high-minded ideals about "diluting the core hand-written with care and diligence" code but I can tell you from a decade of experience working with the Node.js source code, hand-written is not always synonymous with "care and diligence".

In the end, the proposed petition just comes across as "AI bad, human good" fear-mongering without any real substance behind it.

The real question is not "Is AI-assisted development allowed?", the real question is "What are the policies and best practices we should put in place to ensure that AI-assisted development is done ethically and responsibly, and that the resulting code is of high quality and does not violate any licenses or copyrights?". The Node.js core contribution guidelines and DCO already provide a strong foundation and framework for that. Should the TSC consider adding some additional language about responsible disclosure of AI assistance? I'd say definitely yes, but that would solely be about transparency and honesty, not about making silly claims that AI-assisted code is inherently "slop" and of lower-quality than human-written code.

Anyway, the TSC will be talking about this more in the coming weeks.