Model versioning
llama.cpp community has developed several versions of model. Please be careful that some model version is not supported by some of the backends.
llama.cpp
For llama.cpp, you can check supported model types from ggml.h source:
enum ggml_type {
// explicitly numbered values are used in llama.cpp files
GGML_TYPE_F32 = 0,
GGML_TYPE_F16 = 1,
GGML_TYPE_Q4_0 = 2,
GGML_TYPE_Q4_1 = 3,
GGML_TYPE_Q4_2 = 4,
GGML_TYPE_Q4_3 = 5,
GGML_TYPE_Q8_0 = 6,
GGML_TYPE_I8,
GGML_TYPE_I16,
GGML_TYPE_I32,
GGML_TYPE_COUNT,
};
llm-rs
For llm-rs, you can check supported model types from llm-rs ggml bindings:
pub enum Type {
/// Quantized 4-bit (type 0).
#[default]
Q4_0,
/// Quantized 4-bit (type 1); used by GPTQ.
Q4_1,
/// Integer 32-bit.
I32,
/// Float 16-bit.
F16,
/// Float 32-bit.
F32,
}
llm-rs also supports legacy llama.cpp models
rwkv.cpp
For rwkv.cpp, you can check supported model types from rwkv.cpp source:
static const ggml_type FORMAT_TYPE_TO_GGML_TYPE[7] = {
GGML_TYPE_F32,
GGML_TYPE_F16,
GGML_TYPE_Q4_0,
GGML_TYPE_Q4_1,
GGML_TYPE_Q4_1_O,
GGML_TYPE_Q4_2,
GGML_TYPE_Q4_3
};