Experiments with GPT Vision for Modelling: A Journey from Screenshots to Whiteboards
The landscape of artificial intelligence is rapidly evolving. The recent announcement of GPT-4 with vision capabilities by OpenAI stands as a groundbreaking development in multi-modal language models. Although this feature is still in its preview phase, the Curiosity team is always on the lookout for innovative solutions. We were immediately captivated and excited by the potential of integrating this advanced technology into our Test Modeller product.
Our journey with GPT-4 began with a simple, yet ambitious, goal: to explore and understand how this cutting-edge AI could enhance our capabilities in model-based testing and quality assurance. In this blog, we showcase some of our early experiments using GPT-4’s vision capabilities within Test Modeller — a tool designed for collaborative, quality-focused development.
Preview #1: Screenshots to Models
In the realm of software development, it’s rare to start a project from scratch. More often, we find ourselves needing to model existing applications or business logic. This led us to an intriguing question: How effectively could OpenAI’s GPT Vision handle the task of creating models from screenshots of existing applications?
To put this to the test, we chose a practical example: a registration form from an ecommerce system. Our goal was to convert this form into a model using GPT Vision. By feeding the screenshot into the system, the co-pilot efficiently analysed and translated the form image into a linear model.
This model served as an excellent starting point but does not give full coverage over how the system reacts to different inputs, and the underlying business logic which ultimately triggers error or success conditions. This is where the power of modelling comes into play, with the ability to further refine and develop the model with SME knowledge of negative scenarios and edge cases, rapidly creating complete system specifications that can then auto-generate tests.
Preview #2: Flowcharts from Non-Exportable Applications
A common hurdle in software development and process documentation is dealing with applications that lack export capabilities. These applications, often critical to business operations, become siloed due to their inability to integrate or share data seamlessly with other systems. This limitation not only hampers the efficiency of workflow documentation, but also leads to a significant increase in manual effort and potential for error.
We un-silo this information by importing computer-generated flowcharts directly into Modeller. This process begins by creating visual representations of the application’s processes, which are then fed into the system. The co-pilot intelligently analyses these visuals, interpreting and converting them into detailed, editable flowcharts within our modelling tool.
This approach is particularly effective for applications that are visually rich, but lack the necessary export functionalities. Bypassing the need for manual data entry or complex integration solutions, we can swiftly convert static images into dynamic, interactive models that accurately reflect the application’s workflows. It also yields very good results as the image is computer generated and therefore by nature is inherently a much easier vision problem to solve.
Preview #3: Wireframes to Flowcharts
The transition from design wireframes to comprehensive flowcharts is a crucial step in the model-based testing process. Wireframes are the skeletal framework of a digital application, outlining its structure and layout, without delving into the finer details of design. Our challenge was to convert these wireframes into detailed flowcharts that not only represent the structure, but also encapsulate the application’s flow and functionality.
The system takes the wireframes and intelligently interprets them, identifying key elements like navigation menus, input fields, and user interaction points. From these elements, modeller co-pilot constructs a flowchart that maps out how these components interact and connect, turning a static layout into a dynamic flow of processes.
One unique challenge in this process is ensuring that the nuances of a wireframe are accurately captured in the flowchart. Wireframes are often high-level and may not include detailed information about every user interaction. Secondly, the wireframes rarely contain business logic which ultimately sits behind a front-end design.
Preview #4: Whiteboard to Flowcharts
Whiteboarding is a key, collaborative approach to ideation and design for most enterprises today. Using GPT-4’s vision capabilities, we capture these initial bursts of creativity and structure them into actionable models.
We achieve this by taking images of the whiteboard drawings and importing them into Test Modeller using the modeller co-pilot. This technology allows us to convert these often chaotic and unstructured drawings into clear, organized flowcharts and models.
The process begins with a simple photograph of the whiteboard. Modeller co-pilot then analyses the content, deciphering text, diagrams, and even hastily drawn shapes. It intelligently recognizes the relationships and hierarchies within these drawings, transforming them into a digital format that serves as the starting point for more detailed models.
While this feature has opened new avenues in capturing and digitizing spontaneous ideas, it’s important to acknowledge its current limitations. The accuracy of converting these drawings into models largely depends on the clarity of the whiteboard sketches. In instances where drawings are overly abstract or text is illegible, the system may face challenges in accurately interpreting the content.
Human-centric development, AI acceleration
In our exploration of GPT Vision through our co-pilot functionality in Modeller, we’ve made significant strides in enhancing the modelling processes with additional sources of data which often exist in images. From transforming application screenshots into detailed models and importing flowcharts from non-exportable applications, to converting wireframes and whiteboard sketches into structured flowcharts, each method has showcased the power of AI.
A key insight drawn from all of these examples is the inherent limitation of relying solely on image-based data. This underscores the critical role of having human expertise in the loop, to scrutinize AI-generated results and refine them with the knowledge of Subject Matter Experts (SMEs). This is essential in ensuring high quality and accuracy of any AI generated content.
The synergy between AI capabilities and human expertise through modelling paves the way for a more accurate, and accelerated approach to software quality. To join Curiosity on our journey to create this synergy at enterprise scale, book a time to speak with one of our experts today.
About the author: James Walker holds a PhD in data visualisation and machine learning, in the field of visual analytics. He has given talks world-wide on the application of visual analytics and has several articles in high impact journals.
Originally published at https://www.curiositysoftware.ie.