We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode OpenAI's Competition: Free Open Source ChatGPT Alternative Challenges Status Quo

OpenAI's Competition: Free Open Source ChatGPT Alternative Challenges Status Quo

2024/3/28
logo of podcast No Priors AI

No Priors AI

AI Chapters Transcript

Shownotes Transcript

Translations:
中文

ChatGPT is obviously an incredibly powerful tool, but that bean said, would ChatGPT die if, for example, someone were to release essentially ChatGPT that was open source.

And the idea been that anyone would be able to take this open source ChatGPT model train IT on any of their own data, or perhaps that R T has stock data that it's r trained on, and then be able to use this for commercial purposes with something like that came out, would that killed ChatGPT? So today we're going to find out because the data bricks has just released dolly two point out, which is a text generating A I model that can power any chats like chat box, text summarizes and basic search engines um and it's the second version, database action released an original version of dolly back in march. And what's really, really important here is that its open source and its license to allow independent developers and companies to use this for commercial purposes.

So you don't have to pay a pay royalty um and you can use this to make money. Which other ones have been have come out but has just been for researchers and for research purposes of ai. Now why is, you know, data bricks, which is a company that really like IT, makes all of its money from data analytics? And why is IT open sourcing an A I model? So the C E O al goza, he says that is just purely for philaner b um he was quoted as saying we are in favor of of more open and transparent large language models in the market in general because we want companies to be able to build, train and own AI power chatbot, another productivity apps using their own proprietary data sets.

Um and it's interesting because he said that we might be the first, but I hope not to be the last. So he's not even hoping you know like there are the the big project. He just wants to kind of set a precedent where people starting to build this.

This is really interesting because obviously, with open a eye receiving ten billion dollars for microsoft, this area is hotly contested. A lot of big tech companies are coming in here. A lot of money is going this way.

So it's really interesting to see a company completely open source in a project like this. So um well, that does sound incredibly philanthropic. I guess you could say um for them to make this free open source model, you also do have to think a little bit about what the upside would happen for data bricks or why.

You know what this this takes a lot of money to build fine tune and and make this so why would they be doing IT? Um when talking to the CEO, he mentioned um in an article recently that he hoped that developers who build using dolly two point O A Better building apps with IT are building using data bricks but to his original point it's um indeed one of the first ChatGPT like models available without major usage restrictions. So you don't have to use data bricks to build a tool here.

Um obviously, that's the hope. And I think no, there is no way that they're forcing people to do this. I just think they are hoping that there's a of good will from them creating this powerful model that people will use IT.

So this is you know of kind of one of the first generation um models that came out like us. The first generation came out a wild o this is of the second generation to that. But what's really interesting is the first generation model and actually a lot of these different AI models that have come out originally were trained on outputs from OpenAI m.

Even google has got swap up in this controversy of training the model off of outputs from OpenAI, which is a clear violation to open terms of service. So um the first version of dolly, the data works released, that's what I did. And the second version they have now and of course that's another reason why they have to make a open source and free is because um if you know what originally was trained on opening eye, so IT would be illegal for them to monitor that. Now IT is not trained on open eye.

They said that they have this new version is on their own preparatory data um they said that they created IT on a training set with about fifteen thousand records generated by um thousands of data break employees who voluntarily contributed files to IT and that know fifteen thousand SAT was used to guide essentially this open source text generation model which they called GPT j six b um and it's they have A A nonprofit research group called in lua I and uh essentially this is all coming out of that so the C E O of data icks he doesn't MIT that dolly too has a few limitations how well a lot of limitations of things pretty new um and one of those is that only does english and you know IT can be toxic and offensive in its responses which is another very interesting aspect of this. Theyve trained this model. And it's kind of like the out of the box stock model of an A I obviously opening.

I has done a ton of work to kind of finding the responses that come out of IT to be more politically correct or do you know not say things that I shouldn't, whatever. And that has got a lot of criticism in itself, because depending on what side of the political spectrum you are, you may or may not like different responses that ChatGPT will give you that bin said. Dolly, two point o um was did not receive a lot of those different um fine tunes is more kind of out in the wild and you're getting something a little bit more um have a little bit more of a rough for draft.

And some people are criticizing that, as I know, dangerous. Some people are saying they would just prefer the stock A I model. So it's really interesting, but that is what's happening with that right now. Um they have A A couple of examples i've seen online where they like asked IT some questions you know there ever is, are trying to get to like the political stuff on there. They ask him about women in the workforce and IT had some like false info that I threw in there, some fake statistics that throw in there.

IT was overall really positive as women working, but the numbers had were wrong and soap anyways people criticize that um IT also was asked about Donald trump and if he was responsible for january six and I just came came up with a bunch effects about a how he went to jail and third war with an and built wall between american mexico which is not a super accurate so obviously, you know IT gets some things wrong like opening I did. And it's going be interesting to see if open sourcing this allows the thing to be fine tuned and to become Better or new versions as they come out, if those will allow this to be more accurate. And a lot of the same problems opening I had early on.

Um but IT is really incredibly impressive. And I think people are definitely overlooking the how big of a deal this is to have an open source language learning model like this. While IT does have its limitations, those can be find tune to by companies and other people in the future or future updates, but doesn't of limitations.

Any company can grab this out of the box and build their own in house A I models of their own person data set and this is incredibly powerful um for really democractic zing A I at this point and not having IT all be held um in the hands of a few powerful companies, microsoft and google in particular. So IT is interesting though because open sourcing can kind of open a whole can of worms, right when you have people taking your code, forking IT, changing IT and modifying IT uh in what happens with open source, uh you don't really get all of the um security that you would have in a close source project. So you know essentially people could introduce dangerous code into ed and hackers and other people can use IT from malicious activity.

So there is that downside and something are saying that that would that is gonna, you know scaring off different businesses and keeping them away. However, some businesses um are using this the telecom giant first or yon is testing dolly to let their engineers ask questions about documents stored on confluence, the collaboration platform for on boarding an and planning um and you know the city of data bricks did say we're freen dolly because we believe open sourcing models is the best way forward. IT gives researchers the ability to freely scrutiny the model architecture helps address potential issues and democratizing alem so that users aren't dependent on costly proprietary large scale models.

Um this is really interesting in right like a one aspect is that research researchers have the ability to freely scrutinize the model. This is something that uh opening has been criticism because you know it's essentially a black box and you can understand why, right? They don't want people to go see how they built IT because it's a preparatory info and they don't want you know people to clone them.

But I mean, come on, everyone's clothing them. Now here's this open source. Everyone kind of figure in the so I don't know if that's really a good enough excuse anymore because people really want to know, you know what is P A, I and what is biases that might be introduced by engineers or developers exit.

And you know, what are the data set that they have a lot of questions about these. And a lot of people just want transparency. They want to know what data was used, what h safeguards were put in place, what bias may have been added. And so it's really interesting that we're now getting um these open source ones that essentially you're gonna get around that.

So essentially um by also open source in that, I would say the data bricks is also attempted to kind of wash its hands of the liabilities um that come along with this right because of you open source that you're like now, you know everyone can monitor IT, everyone can make um changes and adjustments to the code base exeter eta am. And so some people are saying, right, this is a little bit less appealing for businesses um but maybe this a smart move on data bricks department because if they were trying to actually make one in house that was competing against ChatGPT, they wouldn't be able to do that. But maybe making an open source, a lot of people can contribute and work on IT.

And if they here's a cool thing about open sourcing. If companies are grabiner and throwing IT into their own code base and its open source, those companies are going to be incentivize to help maintain and improve the software. So it's give me really interesting to see what happens if this get wide scale adoption.

Um I think that you are going to be able to see these things really improves. And even the C E O of data c said you should expect both a continued investment in open source as well as innovations that help create ah that help accelerate at the applications of elements to key business chAllenges. So database ks is IT would appear as though they are continuing to be committed to this kind of project and these types of projects in the future. And if this thing can really take off, this could be a very powerful chAllenge to open a eye. Um this might be able to take them on away that google or others are at the moment struggling to do so it's can be really interesting to watch the space and see what happens in the future.