GitHub Copilot: A revolution in disguise?

Squeeds Julkalender | 2022-12-18 | Christofer Hearne
On the surface, GitHub Copilot seems to be one of many AI-projects where the power of collecting user data is converted to useful tools for their endusers. But what hides under the hood of your new AI pair programmer and what are its effects on the daily life of software developers? Dystopian mess or revolutionary productivity booster?
code-gab5a3e049_1920.jpg

Introduction

When I first saw the powers of GitHub Copilot it could only be compared to a Christmas miracle. The developer just asked for functionality inside a comment, which simply stated that he wanted a tree-sort algorithm. Nice try Mr.Developer, but life ain't that easy. You actually have to write the functionality yourself, or shamelessly plug someone else's solution from Stack Overflow.

But there it was, Copilot seamlessly suggested 8 lines of code, with great variable-names, indentation and the correct output within seconds. 

According to Microsoft themselves, this is a product of billions of publicly available code trained through the natural language AI model based on OpenAI's Codex. To put it simply, natural language is formed through humans interacting with each other. It's the language we create and evolve naturally through use and repetition, without any conscious planning or premeditation. It's usually how humans create specific cultures around language, where people 5 miles apart may have one way of using language compared to the other. The opposite is the language we created for programming computers or to study logic, as it is a language with a specific set of rules i.e. a syntax. Copilot simply translates our sloppy, often badly formed sentences into (often or not) working computer code.


Copilot in action

Let's take an example:

You want to create a function that holds a list of the top 7 biggest cities in the world, do some sorting of that list and print it to the console.

Normally this means defining the function, giving it a name, assigning variables, implementing sorting-functionality and returning the values that we want. At the same time, it all has to comply with the syntax that the language has predefined. With Copilot, this almost appears ineffective and slow even though its the basis for all coding. 

Here is that same example, with Copilot for Visual Studio Code:


Here is all that was needed for the example to work in Javascript: 

  1. Ask Copilot to give us a function that holds the list
  2. Ask it to sort the list by population
  3. Ask it to remove the cities that contains an "m"
  4. Ask it to log the results
  5. Run the solution

As shown in the example above, simply write in a comment what you want the code to achieve, press ENTER and wait for it to give you a suggestion. Then press TAB to accept the suggestion. And voila! Your code is applied in your solution and hopefully working as intended.

An important thing to note is that the AI isn't perfect. You are occasionally going to get bad or ineffective suggestions, so be wary and make sure to review the code before you apply them.

Even though this is a simple solution for most developers, the example above shows what a productivity boost it can be in a developers day-to-day. While interacting with Copilot it becomes obvious that if you know how to phrase your needs effectively, a good or acceptable suggestion is almost always given to you.

And according to Github themselves, the intended productivity boost is showing. In a qualitative study with 2000 U.S.-based developers conducted by the company itself, using Copilot correlated with improved developer productivity. According to their data, the developers that reported increased productivity also accepted the most suggestions from copilot itself, suggesting that the code they applied also had value. 

My personal experience with the tool has been excellent so far and a big positive is the fact that it exposes developers to new ways of solving problems. For the curious developer, Copilot will give you more tools and methods to use in other contexts that you haven't explored or found yet. It is truly a repository for learning in addition to being a great productivity tool.


Some dilemmas arise

But is it all good? When trying to understand Copilot's implications on software development and developers in general, it seems that the conversation revolves around these two main themes: 

  1. Is Copilot going to replace Software Developers completely? 
  2. Is the AI ethical? 

This is my take on the themes above: 

Are you being replaced?

Whilst any new advancements in AI benefits from discussions like these, I think the amount of time spent on this subject is misplaced with Github Copilot. A consensus that I hopefully share with a majority of people is that the AI-movement should improve the general quality of life for the human utilising it, and that AI should merely be an extension of our own capabilities and not a competing force. In the realm of Copilot, the overwhelming opinion is that it simply is not powerful enough to come close to being a direct competitor to developers, and I fully agree.

As shown with the example above, Copilot is extremely effective at giving suggestions and optimisations for narrow objectives. In other words, it might be of great assistance in constructing small bits of logic for a micro-service, to create a navbar for your Javascript-website or give you an effective algorithm for getting a fibonacci sequence. It won't however be creative enough to both apply common sense and to communicate between several domains. In a complex and big enterprise with multiple integrations communicating with each other, Copilot just does not have the capability to replace the human brain being able to process the problem, put it into the context and solve it over these domains. Being a developer, regardless of how binary our work looks from the outside is a highly creative endeavour. We as developers have to choose defined goals, strategize and to think creatively in terms of the solution we want to create. The state of AI simply does not have this capability yet (and for the foreseeable future), and might even be a problem that exceeds the absolute limit of what AI can solve.

For the time being, Copilot provides impressive assistance for developers tackling problems but still needs human intervention to be able to apply it within their unique scope. At the time of writing this article, the even more impressive AI-tool ChatGPT was released to the public. With a wide array of capabilities and almost alien computing capabilities, the AI also shows the same flare of limitations that AI currently has where human intervention almost always is a given. 

Is Github Copilot ethical?

As mentioned earlier in this article, the data collected when creating Copilot was publicly available code from public repositories on Github itself and other sources. Since the data was already public from the beginning, is Github in the right for using that public data to construct their own service? The discussion seems to be very divided on the subject. Some people assign the use of public code as a manifestation of Fair Use, where the code is not inherently stolen and copied into the solution but is used to train a new AI-entity or service. Others argue that Github and Microsoft are in direct violation of several copyright and copyleft licenses that the companies have ignored for years. As the discussion is heavily divided, I will merely focus my take on the fairness-aspect. 

Is it fair to developers that their own public code, which is their intellectual property in some way or another, is used to train machines to create more code? In my opinion, with the example of Github Copilot, no

Copilot has exposed one of many ethical unbalances with modern AI which is the balance of rights between individuals use of public activities on the internet and corporations using "user-generated content" to train their own AI:s. How many developers were informed and given the ability to consent to their public data being used by Github Copilot in the first place? If going by Github-users own concerns in the Github Forums, the functionality of rejecting the collection of data was added after the technical preview was launched. It was also after this launch that users were even allowed to disable the collection of their code-snippets, which is the prerequisite data for training the AI itself.

Even though Github tout to follow responsible practices for processing data, the collection of code-snippets introduce other concerns regarding the risk of sensitive information being sent to the training model. This exposes the risk of data like API-keys, authentication-strings or other personal information being sent to the training-model for processing. As much as the developer has a responsibility to not expose sensitive information in public repositories, Github should mantle some responsibility to make sure that sensitive data is dismissed. Even Github themselves admit that since code-snippets are collected, it will collect sensitive information if it is included in the snippet itself. Even in very rare occurrences, that data might be part of the output (code suggestions) verbatim. 

The fact that Github Copilot is currently behind a paywall adds to the consensus of unfairness. There is to no one's surprise that free software developers, contributing code to a free software project is concerned that their code is being incorporated in an AI-tool that comes at a price. It sets a bad precedent, especially since user's free contributions is the only reason the tool exists in the first place. As for the consequences of these fairness issues, it is still unclear if the company are going to face consequences at all. It is very clear however that Github Copilot is not in the business of driving ethical AI forward but seems to act on the philosophy of "good-will" to mask these issues. 


Summary

Dilemmas aside, Github Copilot is truly a fantastic tool for learning new ways of tackling problems and making you a more effective developer. It is a beautiful and hopefully persistent idea of using user-generated content to create tools to help the users themselves.

Even though the tool brings an economic value for the corporation itself, the business model seems to be based in the tool being helpful, which hopefully creates even more value for everyday developers as it grows in size.

It will probably not replace software developers at all, but will most likely be an addition to the developers toolbox. The implications on developer productivity is clear, but its implications on privacy and ethical concerns still cause motivated anger and frustration.


Sources

Github Copilot - Your new AI pair programmer

Sure, Github's AI-assisted Copilot writes code for you, but is it legal or ethical? - Steven Vaughan Nicols

What AI cannot do - Kai Fuu Lee

Github Discussion: Copilot - Privacy and Datamining

Github study: How Github Copilot helps improve developer productivity