In this article, we will compare four of the most advanced AI bots: GPT-4, Bing, Claude+, Bard, and GitHub Co-Pilot. We will examine how they work, their strengths and weaknesses, and how they compare to each other.
Testing the AI Bots for Coding
Before we dive into comparing these four AI bots, it’s essential to understand what an AI bot for coding is and how it works. An AI bot for coding is an artificial intelligence program that can automatically generate code for a specific task. These bots use natural language processing and machine learning algorithms to analyze human-written code and generate new code based on that analysis.
To start off we are going to test the AI on a hard Leetcode question, after all, we want to be able to solve complex coding problems. We also wanted to test it on a less well-known question. For our experiment, we will be testing Leetcode 214. Shortest Palindrome.
GPT-4 is highly versatile in generating code for various programming languages and applications. Some of the caveats are that it takes much longer to get a response. API usage is also a lot more expensive and costs could ramp up quickly. Overall it got the answer right and passed the test.
[Bing] The submission passed all the tests. It beat 47% of submissions on runtime and 37% on memory. This code looks a lot simpler than what GPT-4 generated. It beat GPT-4 on memory and it used less code! Bing seems to have the most efficient code so far, however, it gave a very short explanation of how it solved it. Nonetheless, best so far.
[Claude+] The code does not pass the submission test. Only 1/121 of the test passed. Ouch! This one seemed promising but it looks like Claude is not that well suited for programming.
[Bard] So to start off I had to manually insert the “self” arg in the function since Bard didn’t include it. From the result of the test, Bard’s code did not pass the submission test. Passing only 2/121 test cases. An unfortunate result, but it’s safe to say for now Bard isn’t much of a coding expert.
[Github CodePilot] This passes all the tests. It scored better than 30% of submissions on runtime and 37% on memory.
It’s fun, you can see the coding examples (with and without comments) that were output by each AI in the link
Source: How AI Bots Code: Comparing Bing, Claude+, Co-Pilot, GPT-4 and Bard | HackerNoon