200 lines
4.8 KiB
Markdown
200 lines
4.8 KiB
Markdown
# Cloudflare Worker AI Endpoint
|
|
|
|
一个基于 Cloudflare Worker AI 的模型的 OpenAI API 兼容接口实现
|
|
|
|
支持多模型调用、多 API 密钥、流式输出等特性
|
|
|
|
## ✨ 特性
|
|
|
|
- 🔄 动态获取最新的 Cloudflare AI 模型列表
|
|
- 🔑 支持多个 API 密钥配置,避免他人滥用
|
|
- 🎯 支持多个 AI 模型配置和调用
|
|
- 🌊 支持流式输出 (SSE)
|
|
- ✅ 完整的参数验证
|
|
- 🌐 默认启用 CORS
|
|
- 📝 详细的错误提示
|
|
|
|
## 🚀 快速开始
|
|
|
|
### 安装
|
|
|
|
```bash
|
|
# 克隆项目
|
|
git clone https://github.com/yourusername/cf-ai-endpoint.git
|
|
cd cf-ai-endpoint
|
|
|
|
# 安装依赖
|
|
npm install
|
|
```
|
|
|
|
### 配置
|
|
|
|
1. 设置 API 密钥 (支持多个,以逗号分隔):
|
|
|
|
```bash
|
|
# E.g.: 生成单个API密钥并配置
|
|
openssl rand -base64 32 | tr -d '/+' | cut -c1-32 | npx wrangler secret put API_KEY
|
|
```
|
|
|
|
2. 配置允许使用的模型列表(wrangler.toml):
|
|
|
|
```bash
|
|
# E.g.: 允许如下3个模型被调用
|
|
[vars]
|
|
MODELS = "@cf/meta/llama-2-7b-chat-int8,@cf/meta/llama-2-7b-chat-fp16,@cf/mistral/mistral-7b-instruct-v0.1"
|
|
```
|
|
|
|
同样可以手动在 Cloudflare 后台配置对应的 ENV。
|
|
|
|
> [!WARNING]
|
|
> 请在后台使用 **Secret** 格式配置 `API_KEY` 设定访问接口的 API 密钥,并确保 API 存放在安全的地方。
|
|
|
|
### 部署
|
|
|
|
```bash
|
|
npm run deploy
|
|
# 或者
|
|
npx wrangler publish
|
|
```
|
|
|
|
## 📖 API 参考
|
|
|
|
### 1. 获取可用模型列表
|
|
|
|
```http
|
|
GET /v1/models
|
|
Authorization: Bearer <your-api-key>
|
|
```
|
|
|
|
响应示例:
|
|
|
|
```json
|
|
{
|
|
"object": "list",
|
|
"data": [
|
|
{
|
|
"id": "@cf/meta/llama-2-7b-chat-int8",
|
|
"object": "model",
|
|
"created": 1708661717835,
|
|
"owned_by": "cloudflare",
|
|
"permission": [],
|
|
"root": "@cf/meta/llama-2-7b-chat-int8",
|
|
"parent": null,
|
|
"metadata": {
|
|
"description": "Quantized (int8) generative text model...",
|
|
"task": "Text Generation",
|
|
"context_window": "8192"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 2. 文本补全
|
|
|
|
```http
|
|
POST /v1/completions
|
|
Authorization: Bearer <your-api-key>
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"model": "@cf/meta/llama-2-7b-chat-int8",
|
|
"prompt": "你好",
|
|
"stream": true
|
|
}
|
|
```
|
|
|
|
### 3. 对话补全
|
|
|
|
```http
|
|
POST /v1/chat/completions
|
|
Authorization: Bearer <your-api-key>
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"model": "@cf/meta/llama-2-7b-chat-int8",
|
|
"messages": [
|
|
{"role": "user", "content": "你好"}
|
|
],
|
|
"stream": true
|
|
}
|
|
```
|
|
|
|
## 👀 支持的参数
|
|
|
|
| 参数 | 类型 | 默认值 | 范围 | 说明 |
|
|
| ------------------ | ------- | ------ | ------------ | ----------------- |
|
|
| model | string | - | - | 必选,模型 ID |
|
|
| stream | boolean | false | - | 是否使用流式响应 |
|
|
| max_tokens | integer | 256 | ≥1 | 最大生成 token 数 |
|
|
| temperature | number | 0.6 | 0-5 | 采样温度 |
|
|
| top_p | number | - | 0-2 | 核采样概率 |
|
|
| top_k | integer | - | 1-50 | 核采样数量 |
|
|
| frequency_penalty | number | - | 0-2 | 频率惩罚 |
|
|
| presence_penalty | number | - | 0-2 | 存在惩罚 |
|
|
| repetition_penalty | number | - | 0-2 | 重复惩罚 |
|
|
| seed | integer | - | 1-9999999999 | 随机种子 |
|
|
|
|
## 💻 调用示例
|
|
|
|
### Node.js (使用 OpenAI SDK)
|
|
|
|
```javascript
|
|
import OpenAI from "openai";
|
|
|
|
const openai = new OpenAI({
|
|
baseURL: "https://your-worker.workers.dev/v1",
|
|
apiKey: "<your-api-key>",
|
|
});
|
|
|
|
// 流式响应
|
|
const stream = await openai.chat.completions.create({
|
|
model: "@cf/meta/llama-2-7b-chat-int8",
|
|
messages: [{ role: "user", content: "你好" }],
|
|
stream: true,
|
|
});
|
|
|
|
for await (const chunk of stream) {
|
|
process.stdout.write(chunk.choices[0]?.delta?.content || "");
|
|
}
|
|
```
|
|
|
|
### fetch API
|
|
|
|
```javascript
|
|
const response = await fetch("https://your-worker.workers.dev/v1/chat/completions", {
|
|
method: "POST",
|
|
headers: {
|
|
Authorization: "Bearer <your-api-key>",
|
|
"Content-Type": "application/json",
|
|
},
|
|
body: JSON.stringify({
|
|
model: "@cf/meta/llama-2-7b-chat-int8",
|
|
messages: [{ role: "user", content: "你好" }],
|
|
stream: true,
|
|
}),
|
|
});
|
|
|
|
// 处理流式响应
|
|
const reader = response.body.getReader();
|
|
while (true) {
|
|
const { value, done } = await reader.read();
|
|
if (done) break;
|
|
console.log(new TextDecoder().decode(value));
|
|
}
|
|
```
|
|
|
|
## 📝 注意事项
|
|
|
|
> [!NOTE]
|
|
>
|
|
> 1. 由于使用了 Cloudflare AI API 获取模型列表,首次请求可能会稍慢
|
|
> 2. 建议在生产环境中设置更严格的 CORS 策略
|
|
> 3. API 密钥支持多个,便于权限管理和轮换
|
|
> 4. 模型配置支持动态过滤,可随时调整可用模型列表
|
|
> 5. 内容长度限制为 131072 字符
|
|
|
|
## 📄 License
|
|
|
|
MIT
|