Is it possible to generate a dataset based on exisiting documents using ollama, openai .

I am working on automating the generation of structured datasets using Kiln AI's TaskRun. My goal is to take a summarized text input and generate an expected dataset in the format shown in the UI (see attached image).

![Image](https://github.com/user-attachments/assets/df714c46-e505-4c51-b1b0-b8bd1e8e33b9)

This is my task instruction 

![Image](https://github.com/user-attachments/assets/3c4814c4-aaf6-4598-82d1-2878acb740ec)

I want to create dataset from existing documents. 

In UI, I can directly input a text and generate a dataset [single data] like this.

![Image](https://github.com/user-attachments/assets/17e2572d-1066-42ef-9532-3aa563018c79)

I want to generate a whole dataset like this , for that I need to automate the process.

I referred to the Kiln AI core docs and found that I can create multiple TaskRun instances in a loop. However, when using the following code to generate a single TaskRun, I am unable to leverage an LLM to produce the expected dataset from the summarized text.

`
item = kiln_ai.datamodel.TaskRun(
    parent=task,
    input="The AIE1515_FC_C does not come with an operating system which must be loaded first before installation of any software into the computer.",
    input_source=kiln_ai.datamodel.DataSource(type=kiln_ai.datamodel.DataSourceType.synthetic, properties={"model_name": "phi4","model_provider":"ollama", "adapter_name": "ollama_adapter"}
    ),
    output=kiln_ai.datamodel.TaskOutput(
        output=json.dumps({"answer_1": "", "question_1": "", "answer_2": "", "question_2": "", "answer_3": "", "question_3": ""}),
        source=kiln_ai.datamodel.DataSource(
            type=kiln_ai.datamodel.DataSourceType.synthetic, properties={"model_name": "phi4", "model_provider":"ollama","adapter_name": "ollama_adapter"}
        )
    ),
) 
`
# Expected Outcome:
- The LLM should process the given summarized text and automatically generate answer_1, question_1, answer_2, question_2, etc.
- The dataset should be structured as shown in the UI example.
- This process should be scalable, allowing multiple TaskRun instances to be created in a loop.
# Request for Help:
- How can I modify TaskRun so that the LLM actively generates the structured dataset instead of just initializing an empty output?
- Are there specific parameters or methods in Kiln AI that allow integrating the LLM for dynamic output generation?

# Any help or suggestions would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to generate a dataset based on exisiting documents using ollama, openai . #224

Expected Outcome:

Request for Help:

Any help or suggestions would be greatly appreciated!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it possible to generate a dataset based on exisiting documents using ollama, openai . #224

Description

Expected Outcome:

Request for Help:

Any help or suggestions would be greatly appreciated!

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions