Ollama: Running GGUF Models from Hugging Face
GGUF (GPT-Generated Unified Format) has emerged as the de facto standard file format for storing large language models for inference. We are starting to see a lot of models in this format on Hugging Face, many of them uploaded by The Bloke.
One cool thing about GGUF models is that it’s super easy to get them running on your own machine using Ollama. In this blog post, we’re going to look at how to download a GGUF model from Hugging Face and run it locally.
I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below:
There are over 1,000 models on Hugging Face that match the search term
GGUF, but we’re going to download the TheBloke/MistralLite-7B-GGUF model.
We’ll do this using the Hugging Face Hub CLI, which we can install like this:
pip install huggingface-hub
We can then download one of the MistalLite models by running the following:
huggingface-cli download \ TheBloke/MistralLite-7B-GGUF \ mistrallite.Q4_K_M.gguf \ --local-dir downloads \ --local-dir-use-symlinks False
Make sure you specify the name of the
gguf file that you want to download, otherwise, it will download all of them!
You can find a list of the model files to choose from on the files and versions page.
This file is over 4GB in size, so connect your ethernet cable if you’re going to try this at home.
Once it’s done, you’ll have the file in your
We’re then going to create a
Modelfile with the following contents:
We then build an Ollama model using the following command:
ollama create mistrallite -f Modelfile
And now let’s see if we can get the model to tell us all about the Grafana visualisation tool:
ollama run mistrallite "What is Grafana?"
The result you get will be different to what I get as these models are stochastic, but this is the output I got on one of the tries:
Grafana is an open source tool that allows the user to query, visualize and alert on time series data. It can be used as a drop in replacement for other monitoring tools such as Cacti, Munin or RRDtool. Grafana does not store any of the time series data itself but will connect to an external data source and display information in graphs and charts. Grafana’s flexible query editor allows users to extract the information they need from their time series data using the most popular query languages such as PromQL, InfluxDB Query Language (IL) and Graphite. It is even possible to create your own custom queries. Grafana also supports a number of plugins which can extend its functionality or connect it to other systems such as Slack, HipChat, PagerDuty and others. Why should I use Grafana? One major benefit of using Grafana is that it provides the user with the ability to visualize their time series data in a variety of ways including bar graphs, line charts, pie charts and more. This makes it easier for the user to identify patterns and trends in the data which can help them make better decisions. Another advantage of using Grafana is that it provides users with a centralized place where they can view all of their time series data from multiple sources. This makes it easy for users to compare data from different systems or even between different time periods. In addition, Grafana offers some unique features such as the ability to set alerts based on specific conditions being met in the data. This allows users to be proactive in identifying potential issues before they become a problem.
I don’t think this is the greatest model I’ve ever used, but it is cool that we now have a choice of over 1,000 models on Hugging Face to run on our own machines.