스트리밍

OpenAI API는 특정 요청에 대해 부분적인 결과를 허용하기 위해 응답을 클라이언트로 다시 스트리밍하는 기능을 제공합니다. 이를 달성하기 위해 우리는 서버 전송 이벤트 표준을 따릅니다. 공식 Node 및 Python 라이브러리에는 이러한 이벤트를 더 쉽게 구문 분석할 수 있는 도우미가 포함되어 있습니다.

스트리밍은 Chat Completions API 와 Assistants API 모두에서 지원됩니다. 이 섹션에서는 채팅 완료 시 스트리밍이 작동하는 방식에 중점을 둡니다. 여기에서 Assistants API의 스트리밍 작동 방식에 대해 자세히 알아보세요.

Python에서 스트리밍 요청은 다음과 같습니다.

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Say this is a test"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Node/Typescript에서 스트리밍 요청은 다음과 같습니다.

import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
    const stream = await openai.chat.completions.create({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: "Say this is a test" }],
        stream: true,
    });
    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content || "");
    }
}

main();

서버에서 보낸 이벤트 구문 분석

서버에서 보낸 이벤트를 구문 분석하는 것은 사소한 일이 아니므로 주의해서 수행해야 합니다. 새 줄로 나누는 것과 같은 간단한 전략으로 인해 구문 분석 오류가 발생할 수 있습니다. 가능하면 기존 클라이언트 라이브러리를 사용하는 것이 좋습니다.

서버에서 보낸 이벤트 구문 분석#

서버에서 보낸 이벤트 구문 분석