How I Solved Partial Data Streaming with LLMs in C# for JSON Responses

When integrating Large Language Models (LLMs) into real-time applications, streaming structured JSON responses can be tricky. LLMs generate content progressively, meaning responses are streamed in chunks. However, this often results in incomplete or malformed JSON, making it difficult to parse and use the data reliably.

I faced this challenge while building a real-time chatbot API that returned structured responses, such as:

A suggested response for the user.
A list of recommended actions based on the conversation context.

However, since JSON arrived in fragments, parsing failed until the entire response was complete. To solve this, I implemented a solution in C# that streams data safely, repairs malformed JSON, and ensures clients receive a fully correct response.

The Problem: Handling Malformed JSON in Streaming Data

Imagine you’re building a real-time chatbot for customer support. The bot generates responses in JSON format and streams them as they become available.

However, due to network latency and chunked responses, the JSON often arrives incomplete, like this:

{
    "suggested_response": "Hello! How can I assi

and

{
    "suggested_response": "Hello! How can I assist you today?",
    "recommended_actions": [

This is not valid JSON, and if we attempt to deserialize it immediately, it will result in an error.

Key Challenges

Handling Incomplete JSON
- Since LLMs stream responses in small pieces, we must process the JSON incrementally without breaking the parser.
Fixing Malformed JSON in Real-Time
- As JSON arrives in fragments, we need a way to repair it dynamically before attempting to parse it.
Preserving Data Integrity
- Some JSON repair libraries alter the content inside strings, which could modify key data.
- Some content, such as LaTeX expressions or escape sequences, could be incorrectly modified by automated repair tools.
- We must ensure that the final data matches what the LLM originally generated.
Sending a Final Verified Response
- Since streaming can be unreliable, we need a fallback mechanism to send a complete and corrected JSON object at the end.

The Solution: Using JSON Repair with a Fallback Mechanism

To solve these challenges, I implemented a streaming JSON repair system in C# using the JsonRepairSharp library. This system:

Streams JSON responses incrementally, allowing the client to get updates in real-time.
Repairs malformed JSON as new data arrives.
Prevents data corruption by extracting only the new content from each chunk.
Sends a final complete response at the end to ensure accuracy.

public async Task ProcessChatResponse(ChatRequest req, CancellationToken cancellation)
{
    Response.Headers.Add(HeaderNames.ContentType, "text/event-stream");
    Response.Headers.Add("Cache-Control", "no-cache");

    StringBuilder finalJson = new();
    string previousResponse = "";

    var responseStream = _ChatService.GetChatStream(req, cancellation);

    await foreach (var chunk in responseStream)
    {
        if (cancellation.IsCancellationRequested)
            break;

        finalJson.Append(chunk);

        // Attempt to repair JSON dynamically
        JsonRepairSharp.JsonRepair.Context = JsonRepairSharp.JsonRepair.InputType.LLM;
        string repairedJson = JsonRepairSharp.JsonRepair.RepairJson(finalJson.ToString());

        if (string.IsNullOrWhiteSpace(repairedJson))
            continue;

        var chatResponse = ChatResponseModel.Deserialize(repairedJson);
        if (chatResponse?.SuggestedResponse == null)
            continue;

        // Extract only the newly generated text to avoid duplication
        string newContent = ExtractNewContent(previousResponse, chatResponse.SuggestedResponse);

        if (!string.IsNullOrEmpty(newContent))
        {
            string jsonResponse = new PartialResponseDto { Type = "chat_update", Content = newContent }.GetJSON();
            await Response.WriteAsync($"data: {jsonResponse}\n\n", cancellation);
            await Response.Body.FlushAsync(cancellation);
        }

        previousResponse = chatResponse.SuggestedResponse;
    }

    // Send final complete JSON to ensure correctness
    string completeJson = CleanJsonString(finalJson.ToString());
    var finalChatResponse = ChatResponseModel.Deserialize(completeJson);

    if (finalChatResponse?.SuggestedResponse != null)
    {
        string jsonResponse = new PartialResponseDto { Type = "final_response", Content = finalChatResponse.SuggestedResponse }.GetJSON();
        await Response.WriteAsync($"data: {jsonResponse}\n\n", cancellation);
        await Response.Body.FlushAsync(cancellation);
    }

    // Signal completion
    await Response.WriteAsync("data: [DONE]\n\n", cancellation);
    await Response.Body.FlushAsync(cancellation);
}

To ensure structured JSON responses, I used a simple DTO class:

public class PartialResponseDto
{   
    /// <summary>
    /// Specifies the type of response, e.g., "chat_update" for intermediate updates or "final_response" for the complete response.
    /// </summary>
    public string Type { get; set; } = "";

    /// <summary>
    /// Contains the chatbot-generated content or message.
    /// </summary>
    public string Content { get; set; } = "";

    /// <summary>
    /// Converts the object to a JSON-formatted string.
    /// </summary>
    public string GetJSON()
    {
        return Newtonsoft.Json.JsonConvert.SerializeObject(this);
    }
}

How This Solution Works

1. Streaming JSON in Chunks

The API streams JSON responses incrementally instead of waiting for the full response.

2. Repairing Malformed JSON Dynamically

As new data arrives, JsonRepairSharp.JsonRepair.RepairJson() fixes incomplete JSON structures to make them parseable.

3. Extracting Only New Content

ExtractNewContent() ensures that only new text is streamed to the client, preventing redundant updates.

4. Sending a Final Verified Response

Once the LLM completes its response, a final, fully repaired JSON is sent to ensure correctness.

Key Benefits of This Approach

✅ Real-time updates: Clients receive chatbot responses instantly without waiting.
✅ Resilient JSON handling: The repair mechanism prevents parsing errors due to incomplete JSON.
✅ Data integrity: The final JSON ensures the full response is correct and unaltered.
✅ Improved user experience: Chatbots feel more responsive and interactive.

Conclusion

Streaming JSON from LLMs in C# comes with challenges, but by implementing incremental JSON repair and structured streaming, we can ensure that:

✔️ Data is delivered in real-time.
✔️ Incomplete JSON is handled gracefully.
✔️ Clients always receive a complete, valid response.

This solution has significantly improved how our chatbot API handles JSON streaming. If you’re working with real-time AI applications, I hope this guide helps you build robust and error-free JSON streaming systems!

Have you encountered JSON streaming issues in your projects? Share your thoughts in the comments!

amitm

Amit Mittal is a seasoned tech entrepreneur and software developer specializing in SaaS, web hosting, and cloud solutions. As the founder of Bitss Techniques, he brings over 15 years of experience in delivering robust digital solutions, including CMS development, custom plugins, and enterprise-grade services for the education and e-commerce sectors.

The Problem: Handling Malformed JSON in Streaming Data

Key Challenges

The Solution: Using JSON Repair with a Fallback Mechanism

How This Solution Works

1. Streaming JSON in Chunks

2. Repairing Malformed JSON Dynamically

3. Extracting Only New Content

4. Sending a Final Verified Response

Key Benefits of This Approach

Conclusion

Have you encountered JSON streaming issues in your projects? Share your thoughts in the comments!

Related Posts

Microservices Sahi Hai (right thing to do) for your startup.

How to Set Custom Headers and Endpoint for Using PezzoAI Proxy with OpenAI’s Official C# SDK

Avoiding Data Loss in EventStream: The Hidden Newline Issue in SSE

Integrating Python AI Models with .NET: Best Approaches

Understanding Large Language Models (LLMs): Base vs. Instruction Tuned

Prompt Engineering for Developers: A Beginner’s Guide