
A large language model or LLM is a type of machine learning model designed for natural language processing or NLP. These models have an extremely high number of parameters (trillions as of this writing) and are trained on vast amounts of human-generated and human-consumed data. Due to their extensive training on this data, LLMs develop predictive capabilities in syntax, semantics, and knowledge within human language. This enables them to generate coherent and contextually relevant responses, giving the impression of intelligence.
Many LLMs in popular use today are foundation models—pre-trained on general and publicly-available data to perform a wide range of tasks. While these models are powerful and often sufficient for many applications, they fall short in specialized domains or when dealing with private data. Key limitations include:
- Lack of domain-specific knowledge – The model’s training data may not include specialized topics (e.g., medical, legal, or financial fields).
- Inference limitations – Even when models are well-trained, they may not always generate the most accurate, relevant, or up-to-date responses.
- Hallucinations – The model may generate or make up incorrect or misleading information when it lacks the required knowledge.
To address these challenges, we can customize the LLM. Here are five techniques that we can use.
Prompt Engineering
Prompt Engineering is the process of designing effective prompts to guide an LLM’s response. It is basic, simple, low-cost, and requires no model modifications.
Key strategies:
- Zero-shot prompting – Asking the LLM to answer without prior examples.
- Few-shot prompting – Providing a few examples in the prompt to improve accuracy.
- Chain-of-thought (CoT) prompting – Encouraging step-by-step reasoning to enhance complex problem-solving.
However, prompt engineering provides limited control, still relies just on training data, and may still produce hallucinations.
Retrieval-Augmented Generation (RAG)
RAG integrates an external retrieval system with an LLM to retrieve relevant information before generating a response.
Process:
- Prepare augmentation data. This will usually involve processing and indexing the data and storing it in a database.
- The system retrieves data from the database.
- The retrieved data is appended to the prompt as additional context.
- The LLM generates an answer based on the provided context.
This approach reduces hallucinations, keeps responses up to date, and allows for dynamic knowledge injection.
However, this requires an additional layer, a retrieval system, and adds latency.
Parameter-Efficient Fine-Tuning (PEFT)
PEFT is a lighter, more efficient alternative to full fine-tuning (see below) that changes only a small subset of the LLM’s parameters.
Common techniques include:
- Low-Rank Adaptation (LoRA) – Adds low-rank matrices to fine-tune a model with minimal computational overhead.
- Adapters – Small, trainable modules added between model layers.
- Prefix tuning – Optimizes additional input tokens rather than modifying model weights.
Process:
- Prepare fine-tuning dataset. This is labeled, domain- or task-specific, and normally smaller than full training/re-training.
- Integrate LoRA layers or adapters into the model.
- Train the layers/adapters. Like other deep-learning models this involves cost functions, back propagation, and evaluation.
- Test the performance of the LLM with the test data.
Compared to full fine-tuning, this has lower cost, faster training, and reduces the risk of overfitting.
However, it may have slightly lower accuracy than full fine-tuning.
Fine-Tuning
Fine-tuning involves modifying the LLM’s internal parameters by training it on additional labeled data, improving its performance in specific tasks or domains.
Characteristics:
- Requires model retraining with fine-tuning data, which is computationally expensive.
- Helps the model learn new knowledge permanently rather than relying on prompting.
- Used for task-specific training (e.g., legal text summarization, medical diagnosis).
Process:
- Prepare fine-tuning dataset, this is labeled, domain or task-specific, and normally smaller than full training/re-training.
- Train the LLM. Like other deep-learning model this will involve cost functions, back propagation, and evaluation.
- Test the performance of the LLM with the test data.
Strong performance improvements in specialized tasks.
The problem is that it has a high computational cost and has the potential for overfitting (too specific to training data) and catastrophic forgetting (loss of general knowledge).
Re-Training
Re-Training refers to completely re-training a large language model from scratch or continuing pre-training using a new dataset. This is a resource-intensive process that involves training the model on massive amounts of domain-specific or updated data to improve its performance.
Process:
- Prepare training data, this is labeled and comprehensive.
- Train the LLM, like other deep-learning model this will involve cost functions, back propagation, and evaluation.
- Test the performance of the LLM with the test data.
This is essentially the full experience and, when successful, results in a bespoke LLM.
The problem is that it has a very high computation cost and all the risks of model development.
Choosing the Right Approach
Choosing the right approach very much depends on your requirements and financial capacity. Careful evaluation will ensure the best balance between cost and performance. The recommended approach would be to start with the technique with the least cost that meets requirements and go up from there. In many cases, prompt engineering or RAG may be sufficient.