- Jack Lau
Training ChatGPT and Deploying on Discord using the Listing Rules of the Stock Exchange as corpus
The power of ChatGPT is still being discussed feverishly in town. Our engineers today came up with a thought experiment. What if we can gradually train some "cases" and some "rules and regulations"? What if once we train it, we can use the chatbot as an A.I. assistant. We are interested at this point not whether this will replace a human, we are merely hoping that the process is easy enough that we can get an assistant. We are interested in:
a. How easy it is to train? programming code, training cost, and running cost
b. How can we deploy?
We are well aware that the system will vastly improve if we allow more training data and time.
We pick the Listing Rules of the Hong Kong Stock Exchange as an example because the document is well structured and robust. The file can be found on https://en-rules.hkex.com.hk/rulebook/main-board-listing-rules As a PDF file, the file is 1,206 pages. As a text file, the rule consists of 378,796 words with 1.9 million characters. While it is certainly not a huge document by computer standard, it is lengthy enough that it will take a human a while to read and even more to digest.
The result is shown below in a video. The whole training process took less than 1 minute and costs us USD 0.21. (In computer jargon, we used 545,751 tokens.).
We then deployed the specially trained ChatGPT on Discord, very popular bot platform. (Incidentally, Microsoft was reported to have made an offer to buy Discord for US $12 billion in 2021. https://www.theinformation.com/articles/discord-ceo-proves-theres-life-without-microsoft . )
In this implementation, we first took in the PDF file of the Listing Rules and convert it into a text file. The "text" file is then dissected into smaller pieces, and randomized. The small pieces are then "embedded" through training within the ChatGPT. This trained version of ChatGPT is then connected to Discord.
As shown in the video above, we tried to tease the chatbot by insisting that they answer the questions in certain format. One of that would be the explain something "as if I were a 10 year old". In another example, we insisted that the chatbot answered the question in 3 bullet points.
There are more which we can do, including adding numbers drawn from the text. For answers that are beyond the Listing Rules, we could enable the chatbot to look through their default database, which of course has been trained on in a far broader scope and intensity.
As a final word, of course, this is done just as an engineering experiment and is not intended to be used to substitute professional advice. The above is done as part of an illustrative example for a Master course we are teaching at the university.