AI chat models at the command line – continued

As described in the previous blog titled AI chat models at the command line I have implemented the procedure that was used for the 7B LLMA to that of the 65B model. Still on the same Mac running Mac OS Ventura, on an i5 6-cores Intel chip, 16Gb RAM. The files and models are kept on a T7 Samsung SSD 1Tb drive.

The procedure is exactly the same for the 65B model as it was for the 7B model but files are much bigger. There are 81 data files labeled from pytorch_model-00001-of-00081.bin tp pytorch_model-00022-of-00081.bin of 1.5 Gb each. See more useful instructions in Appendix 2.

The combined model file was 122 Gb. and the final “4-bit quantized” final model file was 34 Gb. That file is what is used as the model to ask questions.

The speed of this model on such hardware is, not surprisingly, excruciatingly slow. The default name of “Bob” used by the 7B model was changed to “ChatLLaMa” for this model. The very first set-up took about 7 hours to be “primed” and ran from Friday around 6pm to Saturday around 1am.

NOTE: The slowness of the answers is due to the hardware and while the Mac was using 2 CPU cores (roughly between 130 and 160% i.e. not full capacity) the slowness can be attributed to the speed of the SSD drive and the fact that the data for the model is larger than the RAM available.

First question

The very first question was about Python “Hello World” as the 7B model had refused to answer it. Even though I made a typo forgetting one blank space, it was a suitable answer. However, the writing of the answer took over an hour. The first 20 min were a slow writing of “ChatLLaMa:” one letter at a time…

USER: what pythoncode for creating a “hello World” script?

ChatLLaMa: import sys print(“Hello world!”)

“As is” this answer would not work, as it should be on 2 separate lines. Additionally the print statement would suffice all by itself.

I can see that there is a lot of reading happening on the T7 drive. The LLAMA model has to process the 34Gb file multiple times during it’s “thinking” which implies a lot of reading and possibly swapping of data with the Mac drive.

Second question

The second question is still going as I am writing… It’s been going for 2.5hours and this is where we are:

USER: what is a large language model?

ChatLLaMa: A language model (LM) in computational linguistics is an algorithm for assigning probabilities to sequences of words, such as sentences. Language models are used in natural language processing

This may take a few more hours… Then I asked ChatGPT 3.5 the same question. The answer took about 2 or 3 seconds, and is quite lengthy but very informative. See Appendix 1 below for that answer.

It took 1 hour to finish the sentence:

, speech recognition, bioinformatics, and other applications.

It took almost 10 minutes for the the comma before the word “speech” to be printed. The word “bioinformatics” was formed in steps: “bio”, then “bioinformatics” and finally “bioinformatics”.

Third question

Reprising one question from the 7B model that was not answered:

USER: where is the moon?

The first 20 minutes served to write, one letter at a time the name of the model “ChatLLaMa:” including the ending colon. To be fair the first word “The” arrived just at about 20 minutes. The word “Moon” arrived 5 minutes later…

After 1 hour the reply looks like:

ChatLLaMa: The Moon orbits Earth at an average distance of 38

One hour and 5 minutes later the phrase says:

ChatLLaMa: The Moon orbits Earth at an average distance of 384,400 km (239,000 mi), or

The final phrase took just about 3 hours to be written…

ChatLLaMa: The Moon orbits Earth at an average distance of 384,400 km (239,000 mi), or 1.28 light-seconds.

The answers are concise, not like those of ChatGPT, even the free, 3.5 version. Bottom-line is that this LLMA works, but is very limited by the hardware.

Fourth question

For the 4th and last question I used one used for the 7B model. It took 2 and 1/2 days to write! But the answer was much more elaborate and probably much closer to the truth. As before it took about 20 min just to write “ChatLLaMa:” and another 30 min to get to “Snow White – 1853 is” and 24 hours more to get to ‘medieval German poem “Sneew‘ which also shows how words are written slowly.

USER: in which book can we read about snow white?

ChatLLaMa: Snow White is a fairy tale known from many countries in Europe, the best known version being the German one collected by the Brothers Grimm. The German version features elements such as the magic mirror, the seven dwarfs, and the glass coffin, that were added to the original story once it began to be told by French storytellers of the 19th century, such as Madame d’Aulnoy. In the first edition of 1812 of the Brothers Grimms’ collection Kinder- und Hausmärchen, the story was titled Little Snowdrop and numbered KHM53. The change to the more familiar title took place in the second edition of 1819.
The earliest attested form of the story is the medieval German poem “Sneewittchen”, which also inspired a completely separate and unrelated 1697 operatic adaptation with music by Augusto Setti and text by Giuliana Francesca Cortese. This version eventually reached the attention of Jacob and Wilhelm Grimm who made their own version publicly available in Household Tales (German: Nachtengalen) in 1812. It was revised in later editions. The name of the heroine was changed to “Schneeweißchen” (“Little Snow White”) in the seventh edition of their tales (1857).

Appendix

Answer of the same question”what is a large language model?” by ChatGPY 3.5

(Note: this was printed in less than 5 seconds, perhaps 2 or 3 seconds only!)

A large language model is a type of artificial intelligence (AI) model designed to understand and generate human-like text based on the patterns it has learned from vast amounts of training data. These models are built using deep learning techniques, particularly using architectures like transformers, which excel at capturing and understanding complex relationships in data sequences, such as text.

The term “large” in “large language model” refers to the massive scale of these models in terms of the number of parameters they possess. Parameters are the adjustable internal values that the model uses to make predictions. More parameters enable the model to learn intricate patterns and nuances in language, making its generated text more coherent, contextually relevant, and natural-sounding.

These large language models are trained on a wide variety of text sources, including books, articles, websites, and more, allowing them to develop an understanding of grammar, vocabulary, context, and even some level of common sense reasoning. They can be fine-tuned for specific tasks, like translation, question answering, text generation, and more.

Large language models like GPT-3 (Generative Pre-trained Transformer 3) have demonstrated remarkable capabilities in generating coherent and contextually relevant text, answering questions, providing explanations, and even engaging in conversations. They have found applications in areas such as content generation, customer support, virtual assistants, programming assistance, and more.

Appendix 2

There are 81 “bin” files to be downloaded, and at 1.5Gb each that takes ample time. Thus I asked ChatGPT for help to devise a for loop with wget to download them all without manual intervention. All other files are small enough, or can be added to a SHELL script in addition to the loop.

The ChatGPY loop suggestion was:

for i in {1..81}; do
   number=$(printf "%05d" $i) # Format the number with leading zeros
   filename="pytorch_model-${number}-of-00081.bin"
   wget "https://huggingface.co/decapoda-research/llama-65b-hf/resolve/main/${filename}"
done