· ollama generative-ai til

Ollama is on PyPi

This week Ollama released a Python/PyPi library to go with their awesome tool for running LLMs on your own machine. You still need to download and run Ollama, but after that you can do almost everything from the library. In this blog post, we’re going to take it for a spin.

I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below:


You can install the library by running the following command:

pip install ollama

And then import the library from your Python REPL or Jupyter notebook:

import ollama

The most obvious first task is installing one of the models. At the moment there isn’t a way to list all of the available models, but you can see what’s available on the Ollama models page.

Installing a model

If we want to install the Mistral 7B model from Mistral AI, we could run the following:


There isn’t any feedback when this is running though, so I’d suggest that you run this type of command using the CLI so that you can see how much is left to download. Perhaps that will change in future versions of the library though, we’ll have to see. Once the library has installed, you can view it like this:

    'license': '...',
    'modelfile': '# Modelfile generated by "ollama show"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM mistral:latest\n\nFROM /Users/markhneedham/.ollama/models/blobs/sha256:e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730\nTEMPLATE """[INST] {{ .System }} {{ .Prompt }} [/INST]"""\nPARAMETER stop "[INST]"\nPARAMETER stop "[/INST]"',
    'parameters': 'stop                           "[INST]"\nstop                           "[/INST]"',
    'template': '[INST] {{ .System }} {{ .Prompt }} [/INST]',
    'details': {
        'parent_model': '',
        'format': 'gguf',
        'family': 'llama',
        'families': ['llama'],
        'parameter_size': '7B',
        'quantization_level': 'Q4_0'

The chat function

There are two functions that you can use to interact with the models - chat and generate. Let’s start with the chat function, which we can use like this:

stream = ollama.chat(
  messages = [{
    "role": "user",
    'content': 'What is a Large Language Model in 3 bullet points?'
for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

The output varies each time you run this, but this is what it showed when I ran it, taking around 3 seconds:

 1. A large language model is a type of artificial intelligence (AI) system designed to generate human-like text based on given prompts or context. It uses deep learning techniques, specifically recurrent neural networks and transformers, to analyze vast amounts of text data and learn patterns in language.
2. Large language models can perform various natural language processing tasks such as translation, summarization, question answering, and text generation. They can also be fine-tuned on specific datasets to improve performance in certain domains or applications.
3. The size of a large language model refers to the number of parameters it has, which determines its capacity to learn complex patterns and relationships in language. For example, BERT, a popular large language model, has over 110 billion parameters, making it one of the largest models to date. These models require significant computational resources and advanced hardware such as GPUs to train and run efficiently.

If we were building a chat app, we could keep an array of prompts and responses so that the LLM could use the full context when replying to the latest prompt.

The generate function

Alternatively, if you just want to do a one-off prompt and get a response, you can use the generate function. Multi-modal modals are all the rage these days, so let’s see if we can get the LLaVA model to describe the following image:

02024 2475991309
Figure 1. A colourful llama
file_path = "/Users/markhneedham/Downloads/02024_2475991309.png"
stream = ollama.generate(
  prompt="Please describe what's in this image:",
for chunk in stream:
    print(chunk['response'], end='', flush=True)

This one takes between 3-7 seconds to run and the output is as follows:

 The image features a colorful stuffed animal, possibly a llama or an alpaca, wearing a pair of sunglasses and a feathery rainbow-colored scarf.  It is standing in front of a pink wall with green dots on it. This fun and playful scene appears to be the focus of the picture.

Configuring your own model

You can also create your own model based on any of the other models that you’ve downloaded or using any of the GGUF files from Hugging Face.

I’m gonna make a more creative version of the Mistral model by setting the temperature to 0.99, as shown below:

FROM mistral

PARAMETER temperature 0.99
ollama.create(model='creative-mistral', modelfile=modelfile)

The new model will be created almost instantly and is called creative-mistral. We can then call that one like this:

stream = ollama.chat(
  messages = [{
    "role": "user",
    'content': 'What is a Large Language Model in 3 bullet points?'
for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)
 * A large language model is a type of artificial intelligence (AI) system designed to understand and generate human-like text based on input data.

* It is trained on vast amounts of text data using deep learning techniques, allowing it to learn patterns, context, and relationships within language.

* Capable of generating coherent and contextually relevant responses or completions to textual prompts, they are used in various applications such as chatbots, content generation, translation, summarization, and more.


I’m super excited about this library being released and I think it makes Ollama even more useful than it already was. I’m definitely looking forward to playing around with this more over the coming weeks.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket