Leveraging the capabilities of Large Language Models (LLM) using APIs such as the OpenAI APIs is an easy way to add intelligence and advanced functionality to your applications. However, token costs add up and they can get quite expensive. Then there’s the nagging question of privacy and security. Finally, you’re limited in your ability to experiment and customize. But if you have a powerful machine with a GPU or two sitting around, wouldn’t it be great to utilize it for running one of those open source LLMs? Here’s how you can do it.
Ollama is a platform that provides tools and services for running and managing Large Language Models (LLMs) efficiently on local machines. It is specifically designed to support AI/ML workflows and offers an API-driven approach for interacting with these models. With Ollama, developers can deploy, interact with, and fine-tune LLMs using features like streaming responses and a common REST API.
Ollama supports many LLMs, with one of the most powerful (as of this writing) being Meta’s Llama 3.3:70b, which features 70 billion parameters. This model is designed for high-performance natural language understanding and generation. Ollama also supports other specialized and versatile models such as Llama 3.2 Vision, which integrates visual processing capabilities alongside text-based functionalities?
Some of these models require significant computational resources, so running them on hardware with sufficient GPU power is recommended. If you don’t have a GPU or need more GPU power, check out my post on renting GPUs with Vast.ai.
Step 1: Install Ollama
The first thing we’re going to do is to install Ollama. In Linux you can do this via:
curl -fsSL https://ollama.com/install.sh | sh
For Mac or Windows, you can download the installer from the Ollama website.
Step 2: Install a Model
Ollama supports many LLMs including llama3.2-vision which is a version of llama3.2 model with visual processing capabilities. Let’s go ahead and pull that from the Ollama repo.
ollama pull llama3.2-vision
Let’s verify if it was pulled correctly.
ollama ls
It should show the following:
NAME ID SIZE MODIFIED
llama3.2-vision:latest 085a1fdae525 7.9 GB 9 minutes ago
Step 3: Run A Model
Now let’s run the model we just pulled.
ollama run llama3.2-vision
You should see the prompt:
>>> Send a message (/? for help)
Type “Hello, Ollama!” and it should respond to you. You can chat with it as you would with ChatGPT. Once you’re happy, type “/bye”. We’re ready for the next step.
Step 4. Start the Ollama REST Service
Normally, the Ollama server is started upon boot up but to be sure, let’s start it manually.
ollama serve
If it responds with:
Error: listen tcp 127.0.0.1:11434: bind: address already in use
Then the server is already running at port 11434.
Step 5. Utilizing the REST Service
Remember our AI-Powered Business Card Reader? We will update it to use our local Llama 3.2 Vision model instead of an OpenAI model.
Let’s update the code to use the Ollama API. For more information on the endpoints, request, and response format, you can refer to the Ollama API Docs.
I called mine processCards-Ollama.js. Or you can get it on the GitHub repo.
// Import required modules
const axios = require('axios'); // For making HTTP requests
const fs = require('fs').promises; // For handling files asynchronously
const path = require('path'); // For working with file paths
// Ollama API endpoint
const OLLAMA_API_URL = "http://localhost:11434/api/chat";
// Check if file is an image
function isImageFile(filePath) {
const ext = path.extname(filePath).toLowerCase();
return ['.jpg', '.jpeg', '.png', '.gif'].includes(ext); // Adjust extensions as needed
}
// Send the image and prompt to Ollama
async function processImageFile(filePath) {
try {
// Read the image content as base64 encoded string
const imageBuffer = await fs.readFile(filePath);
const base64Image = Buffer.from(imageBuffer).toString("base64");
// Prepare the request payload
const payload = {
"model": "llama3.2-vision",
"stream": false,
"messages": [
{
"role": "user",
"content": "Identify the contact's name, title, email address, phone number, company name, industry, and website and output in a comma-delimited string.",
images: [base64Image],
}
]
};
// Send the request to Ollama
const response = await axios.post(OLLAMA_API_URL, payload, {
headers: {
'Content-Type': 'application/json',
},
});
const content = response.data.message.content;
return content;
} catch (error) {
console.error("Error processing image:", error.message);
}
}
// Append the result from Ollama to a CSV file
async function appendToFile(filePath, content) {
try {
await fs.appendFile(filePath, content + '\n', 'utf8');
console.log('Content appended successfully!');
} catch (err) {
console.error('Error appending content:', err);
}
}
// Go through each file in the folder
async function processFolder(folderPath) {
try {
const files = await fs.readdir(folderPath);
for (const file of files) {
const filePath = path.join(folderPath, file);
if (isImageFile(filePath)) {
const extractedData = await processImageFile(filePath);
if (extractedData) {
await appendToFile('contacts.csv', extractedData);
console.log('Processed:', filePath);
}
} else {
console.warn('Skipping non-image file:', filePath);
}
}
} catch (error) {
console.error('Error processing folder:', error);
}
}
// Get the folder containing the business card image files.
const folderPath = process.argv[2];
if (!folderPath) {
console.error("Please provide a folder path as an argument.");
process.exit(1);
}
processFolder(folderPath);
Step 6. Run the Code
Run the code from your shell prompt:
node processCards-Ollama images
If everything works out right, you should have a contacts.csv file that contains the business card information you want to get.