Get Started
Large Language Model on node.js.
This project is in an early stage, the API for nodejs may change in the future, use it with caution.
Prerequisites
Node.js version 16 or above
(Optional) Typescript: When you want statically typed interfaces
(Optional) Python 3: When you need to convert the pth to ggml format
(Optional) Rust/C++ compiling toolchains: When you need self compilation
Rust for building rust node api
CMake for building llama.cpp project
Clang/GNU/MSVC C++ compiler for compiling native C/C++ bindings, you can choose:
build-essential for Ubuntu (run
apt install build-essential
)XCode for MacOS (run
xcode-select --install
)Visual Studio for Windows (Install C/C++ components)
Compatibility
Currently supported models (All of these should be converted to GGML format):
Supported platforms:
- darwin-x64
- darwin-arm64
- linux-x64-gnu (glibc >= 2.31)
- linux-x64-musl
- win32-x64-msvc
Node.js version: >= 16
Installation
- Install llama-node npm package
npm install llama-node
Install anyone of the inference backends (at least one)
- llama.cpp
npm install @llama-node/llama-cpp
- or llm-rs
npm install @llama-node/core
- or rwkv.cpp
npm install @llama-node/rwkv-cpp
Getting Model
For llama and its derived models:
The llama-node uses llm-rs/llama.cpp under the hook and uses the model format (GGML/GGMF/GGJT) derived from llama.cpp. Due to the fact that the meta-release model is only used for research purposes, this project does not provide model downloads. If you have obtained the original .pth model, please read the document and use the conversion tool provided by llama.cpp for conversion.
For RWKV models:
RWKV is open source model developed by PENG Bo. All the model weights and training codes are open source. Our rwkv backend uses rwkv.cpp native bindings which also utilized the GGML tensor formats. You can download the GGML quantized model from here or convert it by following the document
First example
This is your first example that uses llama.cpp as inference backend, make sure you have installed @llama-node/llama-cpp
package.
// index.mjs
import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";
const model = path.resolve(process.cwd(), "../ggml-vic7b-q5_1.bin");
const llama = new LLM(LLamaCpp);
const config = {
modelPath: model,
enableLogging: true,
nCtx: 1024,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
nGpuLayers: 0
};
const template = `How are you?`;
const prompt = `A chat between a user and an assistant.
USER: ${template}
ASSISTANT:`;
const run = async () => {
await llama.load(config);
await llama.createCompletion({
nThreads: 4,
nTokPredict: 2048,
topK: 40,
topP: 0.1,
temp: 0.2,
repeatPenalty: 1,
prompt,
}, (response) => {
process.stdout.write(response.token);
});
}
run();
To run this example
node index.mjs
More examples
Visit our example folder here
Acknowledgments
This library was published under MIT/Apache-2.0 license. However, we stronly recommend you to cite our work/our dependencies work if you wish to reuse the code from this library.
Models/Inferencing tools dependencies:
- LLaMA models: facebookresearch/llama
- RWKV models: BlinkDL/RWKV-LM
- llama.cpp: ggreganov/llama.cpp
- llm-rs: rustformers/llm
- rwkv.cpp: saharNooby/rwkv.cpp
Some source code comes from:
- cpp-rust bindings build scripts: sobelio/llm-chain
- rwkv logits sampling: KerfuffleV2/smolrsrwkv