2nd Gen AI Products: Design Decisions for AI MVP

Varun Aggarwal
Sep 11
6 min read

You have an AI idea which can deliver significant value to potential users and customers. Now you wish to build an MVP to demo to your first users and be able to close your first customers or design partners, who will closely work with you to a PMF (product market fit). What is the recipe to get there in reasonable time with high chances of convincing your users? What to take care of and what are pitfalls? Let us figure out. 1. Figuring the AI feature set

The key in developing MVP is to deliver ‘enough’ customer value that leads to a buy/use decision. The key is thus to make a list of features, their value to customer (high/must-have, good-to-have, may-be) and time/ease to develop. Based on these you will prioritize features which have high customer value and low time to develop. For each of the features, you will have to clearly define the feature and go the nine-yards of wireframing, designs, clickable demos, etc. so that you know exactly what has to be delivered.

Feature Fog Pitfall: A pitfall here is to not have a documented and visual/logical definition of the feature and what is built is different from what was expected and doesn’t deliver the expected value. Artificial Intelligence based products add another level of complexity in building your MVP. It is about AI feasibility/accuracy of the feature. Will AI be able to deliver user satisfaction to the level you want? If the output is objective, say a grade, or say revenue of a company detected automatically, it is easier to measure the accuracy of the parameter. It becomes trickier, when you use generative AI to build a subjective output, say generate a video, get insights about a company from a report, or generate a blog outline. Different people may have different views of what is good, and even if you have a consensus, it is hard to automatically test good responses vs. others.

Social Media Delusion Pitfall: Do not believe what your social media feed tells you AI can do. It doesn’t tell you how well AI does the task!

We will deal with this problem later, but what is important for now is that we have a third parameter introduced in your feature listing and decisions. It is AI feasibility/accuracy. Let us look at a rubric for this:

Easy of build, easy to test: A feature where literature/SOTA results show ease of building. Testing should be tractable.
Easy of build, hard to test: A feature where literature/SOTA results show ease of building. Testing will need large datasets/human interventions.
Hard of build, easy to test: A feature where there is little literature/SOTA results of trial or success. Testing should be tractable.
Hard of build, hard to test: A feature where there is little literature/SOTA results of trial or success. Testing will need large datasets/human interventions.

** Accuracy/testability is not the only parameters, but depending on your use case include, latency, cost, etc. issues. We consider it in next section.

Now, you need to take into consideration the new ‘AI buildability’ parameter into account, while making feature decisions. If your necessary/must have feature is AI feasible, you are all set. However, if the core features of your AI product is hard to build, that is what you should prioritize first and establish product feasibility. If you are not able to build these, you wont have a product at all!

Let us work to the second stage now of ‘defining’ AI features: the success criteria for your AI feature. For non-AI features, this is relatively easier, in form of objective business logic and UI/UX screens. What is it for AI features?

2. AI Feature Definition/Success Criteria

Step 1: Comprehensive set of criteria

Your success is not the accuracy of results alone. Depending on the application, it could include latency – response time of your system. For example, if it is a real time video interviewer, you will care whether your AI can process and respond in time. In case of even a text response, if the tool says take a few minutes to respond, it may break the use case. Another often neglected parameter is estimated cost of AI operation. Your customer will be ready to pay a certain price based on value generated. You need to calculate whether the pricing is feasible, given your compute and tool costing. The pricing may not be feasible today, say with a paid service, but with building your own or over open source and with economies of scale kicking in, the pricing may turn feasible in future. These calculations needs to be made now. There are other parameters around data privacy, AI ethics, which must be thought through at this stage to come up with a comprehensive set of criteria.

Step 2: Defining the Success Criteria

Let us deal with model accuracy. If your output is objective information, it is easier to test. You need to create a good test data set which represents the diversity of expected user inputs and label them properly. The dataset size should be reasonable depending on your application. Set an accuracy benchmark on this data set and test your algorithms to get there. The most important thing to consider is that your test and train sets are different and they don't explicitly or implicitly get mixed up.

Let us consider the case where your model generates a subjective output- say text or video. Here, a small set of examples, say in the 10s, is fine to begin with. What is more important is creating a set of responses to the inputs and having expert consensus that these are the ideal expected responses. This will help you build the gold standard expectation.

Tip: Your output might be subjective but can have objective information useful for benchmarking. For example, a generated image is subjective, but whether it has a person, male/female, certain objects can be determined with high accuracy.

There are various ways to generate ideal responses. You can hand-write or design the ideal responses yourself or by help of an expert. For example, if your algorithm will generate images based on a text prompt, first find the ideal acceptable images for the prompts. Or if your algorithm will generate a blog outline, write yourself an ideal blog outline for a topic and test it with SEO experts. You could use online crowdsourcing tools like MTurks or Prolific to generate responses, and do an expert check on it. You can use publicly available data: say you want to generate a summary of stock market status in the energy company segments, you can find such summaries in analyst reports/YouTube Business Channel broadcasts. The main consideration is to be confident of the accuracy of the gold standard data.

Confirmation Bias Pitfall: Often, AI engineers will generate outputs using AI and then feel that those are the right outputs, than what the problem demands. Write ideal outputs blind to AI outputs when you start!

The first set of validated outputs will help you manually optimize your algorithm. Please note, you are not using the gold outputs in the training process here (say, fine tuning, or in prompts), but adjust techniques/parameters ‘unsupervised’ to these and manually compare against gold standards. These gold set should remain untouched in explicit use.[1] The objective is to bring the algorithm in some acceptable range of gold standards.

After this you may do a more extensive testing. Remember you are still at MVP stage, so do not go overboard. Make sure you are continuously sampling user data to see that the model is working as per expectations and fixing it for undesired behavior.

Tip: Do not miss to include guard-rail criteria and test cases. Important even at MVP stage!

3. Using open source vs. commercial tools

Model building is beyond the scope of this blog. For completeness we will dwell on one important consideration: whether to build from scratch/open source or use commercial AI services. Our recommendation is that it is useful to use commercial AI services, because that will give you speed of implementation and lesser issues to worry about - corner cases, deployment infrastructure, latency, guardrails, etc. Demonstration of working technology, speed and ability to quickly iterate is key at MVP stage. The downside is that this may turn out to be costly, which we recommend can be optimized as you scale. At scaling point, you can move to more in-house solutions for better control, cost and data privacy considerations.

Rounding it up,

Start with choosing the right features to build taking into consideration AI feasibility - ease of building the model and testing
Define your test criteria for AI feature properly looking at comprehensive list of parameters and having a gold set criteria.
Develop a quick technology demonstration using commercial tools, which can be optimized later.

Happy building! *[1] You can create a separate set for fine tuning, or prompting etc. but not touch the gold set. In this blog, we concern ourselves to criteria and training hacks are beyond its scope.

Posts

2nd Gen AI Products: Design Decisions for AI MVP

2. AI Feature Definition/Success Criteria

3. Using open source vs. commercial tools

Recent Posts

Comments