The journey of AI began in the 1960s, when scientists attempted to replicate the functioning of neurons and brain circuits in technical models.
As a result of the limited computing power and memory capacity, these early attempts failed. This led to a so-called “AI-Winter”.
During this time, research into on AI was largely discontinued and those who persisted were often ridiculed.
The breakthrough with ChatGPT
The introduction of CHatGPT caused a new hype around AI. Although language processing and natural language processing (NLP) existed before,
ChatGPT added a new dimension into the world of AI.
Suddenly a model was available that could solve tasks previously reserved for humans.
The ability of GPT-3.5 and later GPT-4 to pass school and university tests caused a significant breakthrough and generated a lot of attention,
that quickly grew into an AI hype.
Superpowers for everyone? Or the end of work?
The possibilities created by AI are manifold. From text creation in marketing and code creation to image, music and video production - the
use cases are numerous and the speed of development is breathtaking. One notable example is the controversy concerning an image created by AI
that won an art competition
. This discussion raises questions about copyright and the depth of creation.
Recently, studies have repeatedly discussed the potential of AI models to take over areas of human work. With it the question is often asked
whether AI models like ChatGPT complement or replace humans work.
In contrast to previous automation processes, which mainly replaced repetitive and simple tasks, new AI models seem to be able to take
on complex and creative tasks.
How will this impact the work of agile teams? Will we be supplemented by tools or will Agent Smith take over?
The impact on software development
The possibilities of using AI in the field of software development are increasingly coming into focus. For some time now,
developers have been able to use tools such as GitHub Copilot
and Tabnine
to help them to write code faster
and more efficiently to get used to unknown code more quickly, learn new frameworks or even find errors.
Anyone who has already used the tools themselves knows that they can be an enormous help, but do not always lead
to the desired results. This has not been a threat to developers so far - the copilot is not able
to design, develop, test and document complex software products from start to finish. But this could change.
The limit of the current tools
The background to this rather limited performance is the fact that the tools work in single-shot-mode.
This means that they are given a task and have to complete it in one go.
They are given context, i.e. they can view the existing code in the project, but they cannot break
the task down into several sub-steps and complete and improve them incrementally with feedback.
To visualize the effect of this restriction, consider the following case: You are given the task of writing
a program with a certain functionality. However, you only have one attempt and have to write it from top
to bottom in one go. It is not possible to make corrections by using the cursor keys or the delete key.
Furthermore, you cannot start the program in order to test it.
This approach may still work for simple tasks, but it quickly becomes impractical for more complex tasks.
After all, it is not how humans work: we read the task, think of a solution, break down the task into steps,
write a part of the code, test it, correct it, continue writing, refactor and thus iteratively approach
a complete, working solution.
What if we could transfer this approach to the tools using language models?
This is where the idea of agent-based workflows comes into play.
The role of agents
Agent-based workflows are a new approach in AI development. They make it possible to solve complex problems
by the collaboration of specialized agents. For example, one agent can act as a planner, while other agents
take on specific tasks such as writing code or creating documentation.
Agents can also interact with each other in feedback loops, critically review each other’s work, suggest
improvements and thus support each other in building better solutions.
Here we see the concept of an agent as a diagram. A human interacts with agents and communicates the task.
The agent is assigned a specific role with its system-prompt, e.g. “You are a senior Angular front-end-developer”.
It also has access to additional context, helpful information, additional tools and a language model to complete the assigned task.
An example of a software engineering agent with a corresponding environment and interfaces is Princeton
University’s SWE-Agent
project, which provides an agent that implements bugs and issues in GitHub
repositories in an autonomous and self-supervised manner.
To put this in context: according to the benchmark SWE-Bench
the SWE agent achieves a complete
solution in 12.99% of the tasks.
However, this alone would not be real progress compared to the previous approach. The real potential of
agent-based workflows unfolds when we employ a team of agents in different roles working together on a complex topic.
Here we can see an example of an agent-based workflow. A planning agent coordinates the collaboration of various
special agents, each of which is responsible for specific tasks. This collaboration allows to break down complex
problems into individual, manageable steps and solve them using specially configured agents.
The architecture of the project is designed by an agent that is tailored to the specific best practices and
specifications of the project (e.g. via Prompt Engineering or Retrieval Augmented Generation
).
The architectural design is critically reviewed by a second agent that specializes in compliance with quality
standards, best practices, architectural- and security-guidelines.
The Angular agent show in the graphic above takes over the implementation of the front-end stories.
It is supported by a review agent which checks completeness and quality of the code and makes suggestions for improvement.
Test are created and executed by a specialized test agent. Any errors found are reported back to the implementation
agent and rectified by the latter. Another agent is responsible for creating the documentation.
Additional performance is provided by the possibility of equipping the agents with language models specially
tuned to their respective tasks. For example, an agent that is responsible for generating code can use a language
model that is specifically tailored to the syntax and semantics of programming languages (e.g. also through model
fine-tuning).
In theory, this concept can be used to build a complete team of specialized agents which work together in
different roles and support each other via feedback loops to jointly build a complex project. This approach
promises substantial higher performance than the simple single-shot approach.
How do agentic workflows look like in practice?
Practical applications
It is clear that agent-based workflows are already being used in various areas today. Companies such as Microsoft
and Meta are experimenting with these technologies, for example at Meta, to improve unit test coverage.
Startups are demonstrating their vision of agent-based solutions for complex software engineering tasks and are
currently being funded generously, such as Devin von Cognition Labs
.
GitHub is launching a currently still private “Technical Preview” of the GitHub Copilot Workspaces
which is
intended to enable collaboration between developers and AI models in an integrated development environment.
However, there are also some open-source projects, such as GPT-Pilot
, ChatDev
, and Devika
, which
translate our theoretical concept of a team of agents into practice and illustrate the potential of this
new approach to work.
Experiment: GPT-Pilot
We took a closer look at the GPT Pilot
and conducted an experiment to experience what working with a team
of agents feels like in practice.
GPT-Pilot is a command-line tool written in Python that can create entire apps. For this purpose, it defines
a team of ten agents with different tasks:
Product Owner-Agent: Responsible for the entirety of the project breakdown into tasks.
Specification Writer-Agent: Asks questions to better understand the requirements if the project description is not sufficient.
Architect Agent: Writes down technologies that will be used for the app and checks if all technologies are installed on the computer. If not, it installs them.
Tech Lead-Agent: Writes development tasks which must be implemented by the developer.
Developer-Agent: Takes each task and describes what needs to be done to implement it. The description is in human-readable form.
Code Monkey-Agent: Takes the developer’s description and the existing file and implements the changes.
Reviewer-Agent: Checks every step of the task, and if something is done wrong, sends the task back to the Code Monkey.
Troubleshooter-Agent: Helps to give good feedback to GPT-Pilot if something is wrong.
Debugger-Agent: In the case of an error, the agent tries to find the cause of the error from the information provided and gives advice on how to rectify the error.
Technical Writer-Agent: Writes documentation for the project.
In addition to the agent roles, GPT-Pilot defines a workflow that coordinates agent collaboration, enables feedback loops and monitors progress.
Specification of the requirements
The creation of a sufficient description of the desired result is - as in “real life” - not trivial:
in addition to the functional requirements, specifications for the technical basis (e.g. framework, database),
the architecture and structure of the project should also be considered,
as well as non-functional requirements such as performance, scalability and security.
If you now feed the GPT-Pilot with your specification, it is possible that it will have questions or needs clarification.
Once the task has been sufficiently described from the agent’s point of view, the agent begins to structure the project
and breaks it down into tasks and work packages.
We have defined the requirements for a simple time-tracking-tool named “TimR” inspired by the
example in the GPT-Pilot Wiki
.
Implementation
Once the steps and tasks have been defined, the implementation agents take over.
After the successful implementation of an increment and positive review by another agent,
test cases are created, which the human client should then carry out and, if necessary,
give feedback in the form of error reports (actual/desired behaviour, error messages).
In other words: the agents tell the human to start the current application and executes the
test cases manually.
The feedback is then returned to GPT-Pilot where it undergoes a multi-stage analysis by
several agents. These agents propose changes, correct errors, review the status, and then
resubmit the revised application for testing along with updated test cases.
In this manner, each larger task is broken down into smaller, manageable steps and is
iteratively processed by specialized agents. Humans are involved in the process at certain
points in order to check the quality of the results and provide feedback.
In our experiment, completing the project implementation required approximately one hour,
utilizing the OpenAI GPT-4 model, and incurred a cost of about EUR 10 for the tokens consumed.
It was remarkable that the provided increments were executable at all times and that the
small application was created step by step in sensible stages.
Throughout the manual testing phase, several issues were encountered, such as absent menu options
and improperly implemented API endpoints. These issues were swiftly identified and analyzed by
the designated agents, leading to corrections by the implementation agent. Subsequent tests confirmed
the successful resolution of these errors.
The functionality and design aspects of the features are currently basic, with particular emphasis
on the reporting charts and error pages needing enhancement. Exploring the potential improvements
through refined requirements could yield significant advancements.
The repository, containing the generated code and documentation (with the sole exception of an update
to the README file), is available on GitHub
.
To set up and run the application, follow these steps:
Install the necessary dependencies.
npm install
Before you are able to start the application, you’ll need a MongoDB instance, which you can simply start with Docker.
docker run --name mongodb -d -p 27017:27017 mongodb/mongodb-community-server:latest
On the login screen, start by creating a new user account. Once the account is set up, you can log in to begin
recording your initial entries.
Findings
It was impressive to see how the “team of agents” analyzed the problem, structured it and broke it down into small, manageable steps.
The cooperation of the agents was efficient and goal-oriented, and the quality of the results is astonishing high.
Errors in the implementation were also quickly rectified following appropriate feedback.
At first glance, the created project meets the requirements and has a sensible structure.
The provided documentation makes it easy to get started.
We would also like to see a test suite that can be used to validate the functionality
and validation of changes and refactorings. This can probably be achieved by adding
a test automation agent to the team.
This experiment raises pivotal questions about the trajectory of software development.
It prompts us to consider whether significant portions of the software development lifecycle
could be autonomously managed by collaborative agents. These agents might independently navigate
complex design and implementation challenges, consulting humans only for critical decisions or
final approvals. Alternatively, it poses the question if humans will continue to lead,
with agents serving as virtual assistants to enhance our productivity.
Naturally, this raises immediate concerns about accountability for the generated code and the
sustainable development of the software project using such tools. Moreover, blind reliance on a
blackbox software-engineering-mechanism poses significant risks.
Challenges and future prospects
Despite the promising approaches, there are still challenges to overcome.
The description of tasks must be precise and detailed in order to achieve optimal results.
The AI unfortunately still does not enable the automatic “do what I want” out of
a paltry requirement description. ;-)
In addition, the interaction and collaboration of the various agents is a complex process
that needs to be developed further.
One detail of the code generation that we observed in our experiment was that the code files
were always completely regenerated, even if, for example, only details were to be added to a method.
Enhancing the incremental code generation process could significantly bolster the stability
of development by increments. The existing methodology poses risks such as overwriting
previously implemented segments, the disappearance of features, or the introduction
of errors into already tested components.
With the current developments towards ever more powerful LLMs, enormous
context lengths (over a million tokens), LLMs specialized in the respective task areas
and multimodal LLMs are opening up new possibilities for supporting software development.
Advanced agent frameworks significantly enhance the creation, deployment, and interaction
of agents, as well as ensuring workflow stability. This advancement opens the door to discovering
which methodologies will be successful and what new applications can emerge from these innovations.
In our attempto Lab, we engage deeply with these inquiries, crafting bespoke agents and workflows
tailored to diverse scenarios. This approach allows us to thoroughly investigate the capabilities
and boundaries of emerging technologies.
Conclusion
The development of AI has come a long way, from its humble beginnings in the 1960s to today’s advanced
models and agent-based workflows. The opportunities that arise from this are enormous and could fundamentally
change the way we develop software.
The idea of solving complex problems through the collaboration of specialized agents promises
higher performance and efficiency. The first experiments and projects are showing promising results
and give an idea of the potential of this new way of working.
As things stand today, the available tools are still at an early stage and do not meet the high requirements
that we place on high-quality software development. We will continue to monitor progress in this area and
regularly test advanced tools.
At the same time, however, we must also face up to the social, ethical and legal issues
which are already clearly emerging today. It is up to us to decide how we want to deal with
the new opportunities as a society.
Alex Bloss - Head of attempto-Lab.
Creative Mind, Innovator, Critical Thinker, Developer, Hands-on Architect, CTO, Trend Scout and Curious Explorer with more than twenty years of experience.
Unlock advanced RAG capabilities using document hierarchies. Our guide explains how structuring data like a table of contents can significantly improve information retrieval, ensuring faster, more reliable, and contextually accurate AI-generated content with reduced errors.