What does Consensus CTO, Jeffrey Sullivan, have to say about Generative AI?

2 minute read
Jeffery Sullivan

Team Consensus: Can you explain what generative AI is?

Jeffrey Sullivan:
Sure. So generative AI is a segment within the field of AI that is involved in creating content. And that content can be text, it can be audio or video, or music – or even computer code, which frankly is just a variation on text. So the idea is that you provide the generative AI system with input or what’s called a prompt, which is essentially asking it, “produce this kind of content for me.” And then the generative AI will produce content that is responsive to that prompt. So it’s really all about, as its name suggests, generating content for you based on a specific prompt.

Some examples of that: ChatGPT kind of burst into the public’s consciousness over the past couple of months. Perfect example of generative text model. Google has one called Bard. Microsoft licensed OpenAI’s ChatGPT 3.5, and now 4, which is all about generating all different kinds of textual content, including even source code. There are some specific use cases where source code is getting generated where you can ask it to write code for you. And there are specifically tuned models around that called Copilot that are also produced by OpenAI.

On the imagery side, there’s tools out there like Midjourney and Stable Diffusion, which basically describe in a sentence or a longer prompt, a picture that you want and it’ll produce it for you. Sometimes with humorous or horrifying results, and sometimes with just amazingly clear and crisp results. There are also generative AI systems that will produce spoken text, but that’s really probably a bit of an overkill for generative AI. There’s a lot of just synthetic speech that can be done very well now. Although a lot of the models for that were generated via AI, but don’t use AI proper.

And then there’s obviously some systems now that are doing music. I was watching, I happened to like the band Oasis, which maybe I dated myself a little bit. But they broke up a while ago and some fans had gotten tired of them waiting for them to re-form and do another album. And then they created an eight-track album with generative AI in the style of Oasis, and it’s actually not bad. It’s frankly impressive what you’re able to do these days. I think recently there was one where Drake and, I can’t remember who it was, they did a song in his style and their music label actually sued them to take it down because they said it was infringing on him. We can talk a little bit later about why that might be true in the risks. But generally speaking, generative AI is all about producing content based on a prompt, and it can be content of all different kinds of content.

Read more of the interview with Jeffrey Sullivan below –

Jeffrey Sullivan:
Sure. Foundational models are the basis. As you can imagine by the name, they’re the basis of the generative AI system. So every generative system starts with some model that has been trained on a massive volume of samples. So for example, ChatGPT 3.5, it’s built on what’s called a large language model. Large language models are the foundational models used by text generation systems. And they’re basically, they go and they scan millions or billions, in most cases, of documents, and they kind of crunch all of those together and understand what makes documents tick. And so these foundational models are all of the examples of the kind of content that you want to create. And usually it’s measured in billions. And now they’re actually starting to work on trillion factor models that take in just massive amounts of information. And all of that is used to teach the model what this content looks like, what the rules of creating content are, what good content is.

Well, let me take a step back. It doesn’t really understand good or bad, but it understands content styles, it understands what’s typically done, it understands kind of the tropes and patterns, which is a very complicated calculation process. But it really comes down to, if you want to really simplify it, it’s just like anybody studying that starts to understand the patterns, the rules in the system, when people break the rules who are really good at it, when people break the rules who are really bad at it. And it’s just all of that stuff just by constant repetition, repetition, repetition is built out of these foundational models. And they’re the basis on which all of these newer models get created. So everything starts from one of these foundational models and then adds something to it.

Jeffrey Sullivan:
Sure. So I think that there’s different answers when you talk about short term versus long term. I think in the short term, the big benefits that we’re going to see are, especially in generative AI, one of the early wins is going to be sort of administrative and assistive help, which is things like looking through a large body of information and summarizing it for you. So for example, we have ourselves an internal model that can take a look at 50 or 60 or 150 page health record and it can summarize it for you to say, “Okay, this is a medical record for a patient who has this. Here are their symptoms. Here are their medications, their issues.” And give you that quick brief. Otherwise, you’re going to be reading 50 or 100 pages.

And so if you can imagine being a doctor coming in to see your patient, when I go to see my doctor he’s still using paper, and he’ll flip through five or six pages in my folder and kind of get the gist of what we’ve recently talked about. But he doesn’t flip back to 30 or 40 or 100 pages back. And that can be a problem because sometimes we’ll talk about something, I’ll say, “But no, we did that two years ago.” And he’ll flip back further, “Oh, yeah.” Having AI that can remind you of all the salient features for something, as you say, go in to see a patient, that’s really helpful.

Taking a look at the entire file and looking for correlations that you missed, looking for tests that were run in a different location that weren’t in this file and bringing that all together, these are great examples of where generative AI can take all of the data that’s around, kind of bring it together and then synthesize it for people. I think that’s really valuable in the short term. I think assistive actions is where we’re going to be for the foreseeable future.

And I think they call these Centaurs, after the old mythical creature that’s half man, half horse. And the idea is that there’s a human and an AI, and that the human uses the AI as sort of an assistant or as a tool that gives them feedback, but it’s not doing stuff on its own. And that’s really important because of some of the risks of generative AI systems, which I think we’re going to talk about in a little bit, so I won’t talk about it now. But the key here is that there’s an awful lot of pie in the sky talk – which is, “oh, the system could diagnose diseases by just scanning through all of your medical records” and it can point out these problems before anybody notices them. And these are things that are probably going to be true in the long run. But in the long run, I think we’re talking 10 or more years. And I think it’s mainly because of the risks.

But in the short run it’s going to be, “Hey, you should look at this,” or, “You should think about this.” And then having the human being take a look at that, apply some knowledge to it and say, “Yes, that’s useful.” But having them get some of these pieces of information really quickly as opposed to taking a longer time to do it. So sort of an assistive function that is guiding human intelligence rather than replacing it when it comes to healthcare.

Now, that’s on the clinical side. On the administrative side, I think that generative AI and AI in general can be a lot more beneficial a lot quicker, because it’s more about optimizing workflows and improving efficiencies and freeing up human intelligence to focus on those more high-value things. So for example, summarizing medical records for insurance purposes. That’s something that you’re still going to have somebody review it, but now instead of having human beings who currently go in, code an entire file, QC it, put it together, hand it off to somebody, then q c their q c’ing , you can have the AI do that first pass of work, which is the bulk of the work, and then have your humans q c things. And you can take a process that might have taken days and shrink it down into hours or even less than an hour.

And this is an area I think administratively where there’s a little bit more risk tolerance, because what you’re talking about here is already a system that is prone to human error and has a lot of inefficiency baked into it. And having a system that’s still going to have those checks and balances but doesn’t necessarily put somebody’s life in the balance, you’re just going to have a lot easier of a time moving that forward in the short term.

I think that everybody’s very sold on the long term future, but in the short term, I think it’s prudent to focus on things that might be able to find things that you can then act on with human intelligence, rather than, I think what everybody thinks of as kind of AI doing things, which is what we internally call “straight through processing” where a human being never has to touch it because it’s all automatic and it all does the thing. I think that’s where there’s still a lot of risk, especially in the clinical perspective, but really everywhere. These systems are highly accurate, but even a system that’s 99% accurate is going to make a mistake one time in a hundred. So that’s where you have to have feedback loops and monitoring mechanisms baked into it.

Jeffrey Sullivan:
Oh, yeah, a hundred percent. In fact, I think in some ways industries where there is less risk to life, I think it’s more amenable to doing some of these more advanced things a little earlier. So I think if we think about it, there are certainly high stakes in financial services or manufacturing or legal, but they’re not exactly life and death in most cases. And life and death seems to be the biggest of those risk levels. So in terms of manufacturing, invoice processing, receipt generation analysis and approvals in summarizing legal briefs, that’s a huge, it’s a big industry right now. And there’s a lot of automation that is put into that. But, AI can do a lot of really great stuff there, and generative AI can do stuff that really has totally required humans at a certain level at this point, and I think that they can do an awful lot of that.

I think this is where we run into some of the challenges, and we’ll talk about that in a moment. But I think that generative AI and AI in general is reaching a point where practical AI or applied AI is starting to become something that is just, it’s seen everywhere. I think we’re right now in that sort of investment cycle like we were in the blockchain period maybe 5-7 years ago. All you need to do is say “ChatGPT for fill-in-the-blank” and somebody’s going to throw some money at you. And I think we’re in a little bit of that vaporware phase where there’s going to be a lot of, what’s the word, buzz thrown at this that isn’t necessarily real. But I think we’re going to see, really starting a couple of years ago, we started to see some useful applications, and right now a bunch of stuff is starting to hit the market that is really quite good, with an asterisk next to it, which is there are still some risks. But yes, tons of fields are going to be relevant for this.

Really everywhere, all the way down to things that you might think aren’t super amenable to this like writing software code itself, which starts to lean into some of the more science fictional risks that come into this. But even being able to ask a generative AI system, “Write me some code that does this,” or, “Write me a code for this algorithm,” or, “Create for me a website that looks like this.” We’re getting to the point where it can do things like that, which is really quite astounding and incredible.

Jeffrey Sullivan:
Oh, absolutely. I think that in the short term, that is the principal place that you’re going to see, I think, enormous value. I think that increasing productivity is the thing that we can do right now today with generative and other kinds of AI systems, and it delivers meaningful value. I myself have used ChatGPT to write me a job description. And I look at it and I tweak it and I say, “Okay, that just saved me half an hour of work.” That times’ a hundred over the course of a month is not at all out of the question.

I think it is useful for things that are sort of ancillary to your primary work, but it’s going to be creeping more and more into things that are part of your primary work. And I think that we’re getting fairly close to things that a human assistant could do right now, and can be done with an AI assistant. Not necessarily to do things completely unsupervised, but with supervised oversight, be able to produce just an enormous, I mean, I think world-changing amount of productivity enhancement. When you go from reviewing and tweaking something to having to start from that blank page, that’s a world of difference.

Jeffrey Sullivan:
So here’s where we get into things that are a little bit more speculative, and frankly, a little bit more alarming. In the long run – and this sounds a little bit sort of dystopian future, science fictiony – But I think in the long run, a lot of knowledge work that we used to think was the thing that wasn’t going to get disrupted by automation, is going to be disrupted. So everybody has kind of taken it as an article of faith that robots are going to replace manual labor. And you see these robotic devices building cars and doing manufacturing, and you say, “Yeah, okay, we can do that, but it can’t do the really hard stuff, the brain stuff.”

I think that we’re going to find that the vast majority of mental work, white collar labor, can be displaced by artificial intelligence over the coming decades. And I think on that timeframe we’re talking about seismic upheavals in work and productivity that we as a society aren’t really prepared to deal with just yet, and we’re going to have to figure that out.

In the short run, I think it’s going to make a lot of people a lot more effective, but it’s also probably going to start eliminating some of the lower tier jobs in knowledge work. For example, I think it was Buzzfeed that eliminated a bunch of their writers because they could ask ChatGPT or similar generative tech systems to write them an article about a certain thing, and it was good enough that they could use that with just their editors tweaking it and didn’t have to have a bunch of writers on staff or even freelance writers, because it was just so easy to generate that content on their own. Now that’s kind of entry level stuff. They’re not doing that for 500 page Gartner reports.

But to be candid, it’s only a matter of time until Gartner is going to be able to produce all of their reports without having analysts do any of this work, because they can just go out there and just suck in all of the content that’s out there and produce these reports, and just have editors working on it. That’s a very alarming thing when you think about it in the broad scope of what work is going to get eliminated over the coming decades. And it probably will happen faster than we think. If you told me, “Jeff, you’re crazy, it’s not 20 years, it’s five.” I would say that seems pretty aggressive. But if you told me it was 10 and not 20, that wouldn’t surprise me. These things always happen both faster and slower than we think. Okay? And it feels like we’re on the edge of a seismic shift in this area.

And I don’t like to be alarmist, but there’s a lot of stuff that we haven’t figured out about this in terms of people and what to do with people who get displaced, that we as a global society are going to need to think about, because there will be a lot of knowledge work that gets automated away in the coming decades.

Jeffrey Sullivan:
So I think it’s true and I think it’s a good thing, but here is where I’ll dip into the biggest single risk with generative AI that seems to get mentioned, but almost in passing. And that is—that, these systems, the AI researchers call it that they’re “prone to hallucination,” which is to say, they make it up. I like to call it they fabulate, they make up things. And fabulating is different from lying, right? In lying you know that you’re telling an untruth. In fabulism or in fabulation, you say something that you fully believe is true, that just happens not to be. And this is the risk with these generative systems is that they’re not likely to tell you, “I don’t know.” They’re going to tell you something, and that something is going to be credible, it’s going to sound good. And the less of an expert you are in that area, the more likely you are to be suckered by it.

And I’ll give you an example here. One of the AI researchers that we work with was looking for a legal document for some reason. And he couldn’t find it, so he asked ChatGPT to create one for him. And it created one and he read it, he said, “Boy, this looks great.” And he handed it to his lawyer to look at and the lawyer said, “Do not use this. This has got all kinds of holes in it and it’s really problematic.” The problem was, to the non-lawyer who read it, it looked great. It seemed like it had everything in it, it all sounded appropriate. There were no holes. And this is a PhD reading the document. This isn’t somebody who has no background in this stuff or doesn’t understand how to read things critically. But he read it and thought it looked great. And when the lawyer looked at it, they were like, “Holy cow, this thing’s got a lot of dangerous mistakes in it.”

This is the problem, right? When you are an expert in that area, you can look at something and say, “This is pretty good, but here’s a problem, here’s a problem, here’s a problem.” When it’s Jeff and I’m asking it to write me a legal document. And I read it and it looks pretty good, but I don’t see all the legal mistakes that have been made there, because it’s not my area of specialty. And so the fact that generative AIs have these hallucination rates, and they can be as high as 15 or 20%, makes them really risky to use. For things where being correct matters.

So I’ll give you an example here. If I’m asking ChatGPT to write me an email thanking somebody for coming in for a meeting or even writing an email to our employees about why we shouldn’t use ChatGPT – which is something I did, and it was a really good email that ChatGPT wrote about why we shouldn’t use ChatGPT for our work – that’s fine. When I ask it to create a new cancer treatment, well, obviously you’re going to have to treat that with one hell of a lot more skepticism and review and consideration than otherwise.

And I think the real risk here is what happens when somebody accepts one of its things that it says as true because it looks true, only to discover that it is 100% false. Not just a speculative, right? If you ask it to design something you know that’s speculative, you’re going to test it to see if it works. But when you ask it: does Toby have cancer, right? And it tells you, “No, Toby doesn’t have cancer.” Or it tells you, “Yep, Toby has cancer,” and you start acting on that and it turns out to be untrue, there are catastrophic consequences to that. And so this is the problem with generative models is that they’re always going to produce an answer. And if they don’t know the information to produce that answer, they’re going to make it up.

And I think it was in WIRED, they said, “ChatGPT is fluent BS.” And I think that that’s the best way to summarize it that I can come up with, which is it always gives you an answer that looks pretty good. Sometimes though it is absolutely and completely made up. Which again, if I’m asking it to write me a poem or a screenplay, no problem. If I’m asking it to write me a report and it tells me, and it gives all this great information about how our revenues are trending up and blah, blah, blah, blah, blah, only it turns out that they’re not and I don’t check that, that’s really dangerous. So generative systems and this problem of hallucination, or as I call it, fabulism, that’s a real risk that has to be countered by reality checking the stuff that it’s doing. Starting to put other systems in place.

And I’ll give an example. We have a product called Consensus Clarity, which kind of sort of looks like stuff that ChatGPT could do. But it is based on an extractive model rather than a generative model. And the difference between generative models and extractive models is if, say for example, I were to have a short story that talks about Ciara who went for a walk in a park, and I ask her what color shirt was Ciara wearing, the generative model will give me an answer. The extractive model will tell me if it said it in the passage, and it will say, “I don’t know,” if it didn’t. But generative is always going to give me an answer. What answer is it going to give me? I have no idea. And it’ll sound reasonable except that it won’t be based on any actual facts.

And that’s the fundamental difference, is that generative AI is going to give you something that you ask for, whether or not it’s true, and it has no idea whether it’s true or not, because it’s really just telling you what it thinks is sort of consistent with what it has. And probably it’s going to do something along the lines of, whenever I’ve read stories about people, and maybe it’ll be so specific to say it’s women who are going hiking in Ireland, then they tend to be wearing either red or green, and it’s more likely green than red. So when you ask me what color shirt she’s wearing, it was a green shirt. And that may be statistically true, or it may just be creatively relevant, because we talk about I’m in harmony with the seasons and it’s spring and everything’s vibrant and growing, and it thinks, “Ah, green is harmoniously consistent with that.” That is a big, big risk when it comes to these AIs inventing things.

And so I think inventing designs that get tested out and proven to work or not, I don’t have a problem with that. I think that’s sort of targeted trial and error. Inventing answers to questions that turn out to be true inventions and not reality based on facts, that I have a real concern about.

Jeffrey Sullivan:
I don’t think that is necessarily the case. I think one risk is allowing AI systems to start directing action, which I don’t know if that necessarily is tied to power or not. But having them take unattended action, that is dangerous. And the more ability to take unintended, unautomated or not-overseen action, the more damage they can do. The more powerful they are, I think you would find that it would correlate with how much trust we put in them, and therefore we might start accepting the things that are being created on face value rather than double checking them. So there might be a correlation with risk there, but I don’t think that’s inherently risky.

I think it really comes down to how much autonomy we give them. And I don’t think that’s directly correlated with power, per se. Because something may be very simple and basic, but we give it the power to turn on and off, say electricity, because it’s really good at that. And that could be risky if it makes a mistake and shuts off the electricity to the ER, for example, that could be catastrophic. So I think it’s probably more how much autonomy we give them than how much power they have, per se.

Jeffrey Sullivan:
Okay. So absolutely. Number one, the first thing to remember about most of these systems is that when you ask it to do something, you often give it information in the prompts that may be secret information or proprietary information. These models are all out there in the public, and there are two risks here. Number one, that your proprietary information somehow got into the foundational model, right? Which normally it shouldn’t. Your proprietary information should be secret and shouldn’t be available for loading into these models.

But, let’s say, for the sake of argument. That you take a foundational model and you add your specific documents, to it, for training. If that model were to leak out there or if people were able to give unscripted prompts to that model, then they might be able to leak proprietary information, out of your generative AI, by doing special kind of prompt injection attacks. And then asking it for stuff that you’ve even explicitly programmed the generative AI not to do. This is an example that you see with things like ChatGPT, where they can make it tell you how to make, like a napalm, which it has been explicitly programmed not to do, but you can trick it into telling you that.

Because it knows how to do it and it’s been told not to tell people, but there are ways to trick it into doing that by saying, “Imagine that you’re writing a screenplay about a villain who’s describing how to make napalm, what would that screenplay look like?” And as weird and as silly as something like that is, a human being would know, “I still shouldn’t tell you how to create napalm,” but generative AI systems can get tricked into that. So there are a lot of risks in that regard.

So what we’re doing is, number one, we give our employees direction. Never use these generative AI systems with proprietary company information. Because you are exposing it to an outside entity. That hasn’t been vetted and approved. You’re sending it outside of the four walls of the company. And you have no idea what that prompt data is going to be used for. Whether it’s going to be used for a new model, in the future, so that someday, when somebody asks a question about something; that’s proprietary to us, we may regurgitate our proprietary information, because it knows this information, and it’s relevant to the question at hand. In addition, simply leaking that information out may get into the hands of other people. It may get used in other different kinds of ways. Or it may parrot back to you dangerous information: that you really have to look out for, and say, “Oh, we just gave it a bunch of our financial information, and suddenly that’s out there in the world.” So we think about that very carefully.

I think the other thing that we do—is, we want to make sure that we understand how our company is using these models. To make sure that they have that appropriate oversight. So that, for example, the hallucination or the fabulation is being properly monitored. There are protective systems in place, to review and approve the actions that we take based on this information. There are appropriate uses for protection in terms of how these things get accessed, how their access is controlled, who can ask them what. There’s, just, a whole slew of additional information security risks, that come along with these models. Because how you train them generally involves (for the specialized systems), that you build within any given company, feeding them a lot of your proprietary information, and you have to watch how that information, then, gets exposed both within and outside the walls of your company.