Launching your new AI Startup in 2023
In the last few months more and more people have been asking me for my thoughts on their AI business ideas, and for help with navigating the space. This post covers the majority of my thoughts on the subject so far. My audience here are those people, not ML experts with extremely deep knowledge who can debate nuance and split hairs on the finest of points.
This post is 50% harsh dose of reality, and 50% advice on how to place your bets.
We need to first clear up some of the hype with ChatGPT, especially because most people think they can start an ML company right now are inspired specifically by it. It’s a great tool, but a lot of people are so awe-struck from their first impressions with the thing and refuse to come down to earth and look at it objectively. Yes, present limitations of AI platforms will probably be solved in the next 5-10 years or sooner, but you don’t have those today. 99% of you who want to launch an ML/AI business are probably not working on hard core AI research problems, but rather looking to use existing AI platforms to make a business. There’s nothing wrong with that, but if you are “high on ML”, you need to stop thinking we have some magic Djinni in 2023 that solves everything by just invoking it by name. There’s a reason the biggest topic in GPT right now is “prompt engineering” to try and steer the model into giving the right answers.
What can AI like GPT do?
GPT belongs to a group of ML models called “Large Language Models” (LLM), it is only one family of model types. There are many models, and this one is doing some awesome and impressive tricks. All ML models today are statistical machines, it means they just try to “guess” (with really complicated statistics) what output data tends to fit the input data. Literally every ML system is doing only that right now. There is no “brain” capable of thinking, but many models have gotten so good at building these statistical models they often produce results that usually look like a human might have done the work.
When you use a model like ChatGPT (which is different from GPT itself), it has been trained specifically with many prompts and outputs that are phrased like human instructions to the computer, so as a result it can product the magic-like effect of “understanding my instructions”. This is clever design work and not a new sophisticated machine that is presently blowing everyone’s minds. Unfortunately this mind-blown state often makes people fundamentally misunderstand the capabilities of this system because interacting with it “feels like” talking to someone intelligent. At OpenAI (the company that makes GPT) they had to hired many many humans to generate lots of text in specific ways so their AI model can find a probabilistically correct response.
When you think about human mental capabilities, think of these modes of thinking:
(I stole these ideas from psychologist Howard Gardner)
Disciplinary thinking: First, Don’t confuse “Disciplinary” with “punishing someone for bad behaviour”. It has another meaning which is related to the study of a discipline such as math or languages or the arts or any pursuit where a human has to acquire knowledge and understand it. This is most of what you do in school — build your disciplinary mind. It’s not just about rote memorization but also about understanding how to use the tools of that discipline correctly (you can argue that schools often ignore this second part frequently). A skilled “disciplined thinker” will not just “google answers” from their own memory but also understand the inner working of those details, and consistently outperform unskilled “disciplinary thinkers” — for my software programming audience: this is like many new developers doing cut + paste coding from stackoverflow posts without knowing “why”, vs the person who actually wrote an absurdly detailed and insightful answer on to that question.
Synthesising thinking: This kind of thinking is characterized by taking two different kinds of disciplinary thinking and combining them in some way. You could think of this as understanding geometry and painting, which allows someone to combine the ideas and come up with 3d perspective lines to assist artists. Another great application is Edward de Bono’s “6 thinking hats” which specifies 6 modes of thinking when applied to some general problem: by combining a problem and a focused mode of thinking, you can get great insights quickly out of humans. Even something like “Write Hamlet as if it was performed by Tupac Shakur” is a good example of synthesising two concepts. A good synthesiser will often know how to take a disciplinary topic and bring it to a conclusion.
(You could argue that modern education system entirely ignores to cultivate this kind of thinking, which I think enhances the massive “Wow!” reaction from many people to GPT — synthesising is something that is rare for modern humans to be good at because we only accidentally cultivate minds to do it).Creative thinking: The creative mind is able to push the boundary of existing knowledge and go into something new. It can ask entirely new questions or give entirely new answers (not linguistically “new combination of letters”, but rather conceptually “new ideas”). You could argue that the creation of a whole new technology (like Integrated Circuits) was creative and pushed boundaries, but that’s not the point. It’s more important to think about the mindset of creativity because that’s what enables creation: Creative thinking is usually extremely dissatisfied with status-quo and hunting to push the boundaries somewhere new. They look for new knowledge, they even look to explore what is “already known” and question it again in new ways.
I think this point will make someone in a discussion forum post comment about how they are confused or angry at me over this point. I’m also expecting a lot of arguments that sound like “there’s no creativity, everything is just a remix of an earlier idea, how can anything be original?”. This is not the argument being made here, and these modes of thinking do not need to be mutually exclusive to exist. We’re identifying a mode of thinking that is “creative thinking”, not the macro-event of “creating an artifact of some sort”. And this mode of thinking is different from having disciplinary knowledge or knowing how to synthesise things.
Now that you have a grasp on those 3 modes of thinking, here’s what you need to know about models like GPT:
It can’t offer any creativity. Zero. Zilch. Zip. Nada.
It can’t offer real disciplinary knowledge, but it’s great at “decorative” disciplinary knowledge.
It can synthesise ideas really well.
So, What does that mean?
GPT is only Decorative Knowledge
I’ve started calling a lot of what GPT does “Decorative Knowledge”. Ask it to multiply two big numbers and it will put on a fancy official looking uniform, some nice makeup, stand on a tall stage, and then proudly and confidently spout out absolute nonsense (see image for example). Ask it to multiple two small numbers like “what’s 5 times 5?” No problem! You will get the right answer.
When your evolved-monkey-brain (which is overloaded in our busy modern world) looks to check for “correctness”, it doesn’t want to rake through the fine details of everything, it instead looks for all the cosmetic appearances of correctness. It defaults to things like, “How could it be wrong? After all, it broke the multiplication work down into several steps, explained itself, AND it’s a computer which is great at math!” — but this ML system doesn’t work that way, it’s just a machine that statistically guesses what comes next based on what it’s seen before. It only knows “when two big things are multiplied something big comes out!“, similar to an eager child trying to figure out how grownups do math and just guesses at a big number. It hasn’t been trained on arbitrary large numbers, but it has been trained a lot on very common low-number combinations like “what’s five times five?", only because it’s more likely to exist in books and other ML training data than two big numbers would be.
Large Language Models (like GPT) are not capable of disciplinary knowledge. They can do rote memorization but they don’t know how to “use the tool” we call “multiplication”, so it can’t actually “do” multiplication… but these models can tell you what 5 times 5 usually (statistically, based on training data) is. This problem is one specific case, but this issue exists for every topic that requires special knowledge including copywriting, writing homework assignments, coming up with a novel, or writing your company’s strategy document. It can’t do those jobs, but it can pull on what it has remembered. — In the section about synthesis we’ll talk about why it can write poems for you about any arbitrary topic.
It’s not creativity either
A lot of humans get really confused here because when these models present us with some information we haven’t seen before, our first instinct is to think “Wow! That’s smart!”, but the model has been trained on more information than you have been exposed to, so a lot of “That’s a new idea!“ or “Wow that’s smart!“ moments you feel, are actually just because you personally don’t have exposure to those ideas. If you’re old enough to remember what it was like before the Internet, recall that it was an impressive show of intellect to have read the entire Encyclopedia Britannica’s set of 32 books — enough to make the average person think “wow, this person knows a lot!”, but not enough to actually accomplish something in any of the disciplines from the books, and still unable to discuss anything not covered in the books.
When these Large Language Models output some idea that’s new to you, be careful that you don’t misunderstand “it accessed some already existing idea” for “it is pushing the boundaries of knowledge“. I haven’t been able to convince GPT one single time to take several existing ideas and synthesize them into a genuinely new and sensible concept, and I haven’t been able to push it to think creatively. By design it is so heavily bound to the patterns it already knows it just has no way to do it.
So what does that mean? A bot like ChatGPT which was trained on an absolutely massive data set is pretty good at achieving basic knowledge. Enough that for some basic tasks you could possibly replace some entry-level knowledge workers by using the bot directly, because it gives you the same answers that entry-level knowledge workers in a discipline can give you, because there was an abundance of it in the training data, and usually the complexity at the entry-level is low enough to be modeled simply, where as complexity at a very senior level is so full of edge-cases, nuances and and exceptions to the rule that sourcing a data set like this and even training on that would be well beyond unreasonable (right now in 2023), especially for all arbitrary disciplines.
Some people I’ve talked to who are still awe-struck by the bot have shared stories with me that they had it write their “2023 Product Mission & Vision”, “Strategy Doc”, or “user stories”, and sometimes people tell me the results are worthless, but other people tell me their results are amazing and on the level of what they had already produced. The unfortunate reality here is that GPT is producing answers that merge in some of the details you provided and finding an “average strategy” from everything it knows. The result is not specific or thought out, but it’s written to sound like strategy (you only asked it to “project these facts into “strategy space””). Usually this strategy document doesn’t really understand the nuance of your business or how to write a great strategy document, so it will miss critical details that an expert would not. It will miss the creativity required to push the boundary and ask hard questions and refine the business’ focus. So what’s the reason there’s a huge perception gap between two groups on the same task? Skill level at that task. Disciplinary experience at creating a good strategy is not very common, and for sure most people charged with a task like that are just winging it.
This is why I won’t use GPT for challenging intellectual tasks that requires depth and needs to be correct, and I won’t use it to create something new. However, I can use it for something that is repeatable entry-level knowledge, humans can correct for themselves, or for which creation is really just a form of “search” for existing ideas.
It’s not a magic wand and it has no agency
It’s just a text prediction machine. This means it cannot look up arbitrary data at arbitrary times, it must be fed everything it knows in advance. It cannot orchestrate multiple complex computer systems or perform tasks that require agency. It’s not about because “it doesn’t have access to the internet”, it would need entirely new systems to be researched and developed to handle any tasks that is in the form of “interact with the world“ or “interact with some other system”. It’s only a statistical model, it doesn’t have the “limbs” to walk around on the internet, it wouldn’t know how to use them even if it had them. It’s a hard research problem, so you will need to develop a real solution with real researchers and software developers to solve it.
Large Language Models are awesome at synthesising
The reason GPT is so good at looking like it can do smart things is because it can take a concept from one space and project it into a new one. The authors of ChatGPT did this is a very creative way, they project it into a space that is “like a chat bot” by training it on a huge data set of chat conversations, and it is further trained on a huge data set of responding to instructions (they call it “InstructGPT”). So they’ve taken a huge 800gb data set of raw data and facts, taught it to access that data in a way that looks like it understands instructions, and then taught it to work in a way that feels like a chat bot, and the last bit of the magic trick is the chat-bot like user-interface. This is all moving ideas from one space and putting them into another — it’s not “wikipedia style” knowledge, it’s “chat style“ knowledge, with the same content. The core technology has been around for a long time, but now it finally got a big data set with a layer of great design on-top.
This is why you can ask the bot to write a poem about any arbitrary concept. It has a basic set of data about that arbitrary concept and it is able to find other similar concepts to what you described, it can then “average out” those ideas and “project it” into the space of what humans expect a poem to look like. It’s like sliding a square from left to right on a grid, but just with text and in a highly complicated way that feels like we’re talking to someone instead.
A lot of what you do in chat GPT is looking up an “average” of information and having it “presented” in some special format.
How should I build a business based on GPT or ML?
It means if your best bet for launching some kind of useful AI based on GPT or LLMs is to either define a space that semi-arbitrary things get projected into, or define something that can be projected into semi-arbitrary spaces, but not both. That’s a really abstract statement, so here’s some examples:
Take a set of product attributes for an ecommerce website and project them into a “product description space” to get a few paragraphs of sales copy for a product.
(People will often label this kind of task a “creative work”, but it’s often really just a synthesis task: move an attributes list to a description space.)Take a Jira ticket or pull request and project it into a “review space” to get a review.
Take english instructions and project them into a “code space” (think: github copilot, or one of the new SaaS business that popped up that let you query a DB with English language).
Find a space to project into, or find something that is useful to project into any space (this second one is really hard to find examples for).
TL;DR:
Projecting one concept into a new spaces is your best bet for success. Creating a new space or expecting deep understanding only results in decorative kitsch results that are nice to look and kind of entertaining but don’t really do anything.
So what are the risks of starting a business around GPT or other AI models?
A simple model can be copied over-night
…and launched the next day
…by basically anyone.
Infact, it usually takes longer to do all the legal paperwork to create a company than it does to copy the software that runs these new AI businesses that popped up overnight. Right now, ML models are nearly a commodity. With the number of ML providers that expose ultra-simple APIs or the number of open source models on GitHub and HuggingFace, you don’t even need to understand the basics of ML to get started — seriously! There’s even services that let you click-and-drag together slack webhooks with ML models you don’t even know about and wire them all together in a day.
The bar for entry has never been lower. That said, the range for high quality results is widening. Great results are now available as commodities, but true boundary-pushing models require ML expertise to accomplish. I should also mention, that for most cases “great models” are probably good enough to get started, and most models out there you might compete against are probably in this range too.
Worse, if you are using some fine-tuned model trained on a private dataset that isn’t “absolutely huge” or tied into some very complex lifecycle, or the ML model itself isn’t highly complex internally: it can just be copied by anyone who can interact with your model. All they need to do is ask it a bunch of questions, get a bunch of answers, and voila — they now have their own training set to re-create something just like the model you worked hard on. You’re only somewhat protected when things are massive in scale or complexity, or you find some way to use the results but still hide the results from everyone.
What you need is a moat.
In business and economics “a moat” is a metaphor that refers to something about the business which provides a competitive advantage. Unfortunately, “Doing AI” is not a moat for the reasons mentioned above.
If you have absolutely nothing to hold down and compete on because your idea can be so easily copied, you probably can only use a Brand Moat. The only reason customers use you is because you’ve marketed your SaaS as something better — even if it’s not true. It’s not a fun strategy, but it can work for a while.
If you want this you need to compete with brand and marketing because very quickly everyone will offer your product at similar quality for less money.
You could also look at an Intellectual Property Moat that is based on very specific fine tuned data, but like I mentioned in the previous section, if it’s not absolutely massive, someone can just harvest a few hundred or thousand responses from you and go make their own with data you provided them. You would need to constantly compete to refine your model and be better than everyone else in that space. The best option here is to tie in to some live data flow that is also private, because this increases the barrier for someone else to replicate your model because you always get data first. You can also try to make custom models per use-case (imagine multiple customers with different data) if you can find a way to scale this model.
You could also establish a Switching Moat, which is usually tied to the cost of onboarding with your SaaS. Imagine you provided an API for others to integrate with: they have to spend time and money programming this integration, and switching away also creates a cost. Some cloud providers have great switching moats.
You could try and establish a form of Legal or Certification Moat which raises the bar for incumbents to compete for your market share. Maybe you get a legal certification for something in healthcare or law for example, and awarded a contract for one specific country to do some niche task.
Lastly, Try to sell shovels
Everyone is selling ML right now digging in random directions hoping to strike it big. So sell a shovel. Make it easier for people to try digging. What’s a good shovel? If I knew, if I had the time, and if I had the money: I’d go start that business instead. As an added bonus, try to strengthen your business because soon all this hype will dies down. So finding a line of business not dependant on “new AI businesses need shovels”, but instead look for a business model that is only accelerated by it.