I am working on automating the generation of structured datasets using Kiln AI's TaskRun. My goal is to take a summarized text input and generate an expected dataset in the format shown in the UI (see attached image).

This is my task instruction

I want to create dataset from existing documents.
In UI, I can directly input a text and generate a dataset [single data] like this.

I want to generate a whole dataset like this , for that I need to automate the process.
I referred to the Kiln AI core docs and found that I can create multiple TaskRun instances in a loop. However, when using the following code to generate a single TaskRun, I am unable to leverage an LLM to produce the expected dataset from the summarized text.
item = kiln_ai.datamodel.TaskRun( parent=task, input="The AIE1515_FC_C does not come with an operating system which must be loaded first before installation of any software into the computer.", input_source=kiln_ai.datamodel.DataSource(type=kiln_ai.datamodel.DataSourceType.synthetic, properties={"model_name": "phi4","model_provider":"ollama", "adapter_name": "ollama_adapter"} ), output=kiln_ai.datamodel.TaskOutput( output=json.dumps({"answer_1": "", "question_1": "", "answer_2": "", "question_2": "", "answer_3": "", "question_3": ""}), source=kiln_ai.datamodel.DataSource( type=kiln_ai.datamodel.DataSourceType.synthetic, properties={"model_name": "phi4", "model_provider":"ollama","adapter_name": "ollama_adapter"} ) ), )
Expected Outcome:
- The LLM should process the given summarized text and automatically generate answer_1, question_1, answer_2, question_2, etc.
- The dataset should be structured as shown in the UI example.
- This process should be scalable, allowing multiple TaskRun instances to be created in a loop.
Request for Help:
- How can I modify TaskRun so that the LLM actively generates the structured dataset instead of just initializing an empty output?
- Are there specific parameters or methods in Kiln AI that allow integrating the LLM for dynamic output generation?
Any help or suggestions would be greatly appreciated!
I am working on automating the generation of structured datasets using Kiln AI's TaskRun. My goal is to take a summarized text input and generate an expected dataset in the format shown in the UI (see attached image).
This is my task instruction
I want to create dataset from existing documents.
In UI, I can directly input a text and generate a dataset [single data] like this.
I want to generate a whole dataset like this , for that I need to automate the process.
I referred to the Kiln AI core docs and found that I can create multiple TaskRun instances in a loop. However, when using the following code to generate a single TaskRun, I am unable to leverage an LLM to produce the expected dataset from the summarized text.
item = kiln_ai.datamodel.TaskRun( parent=task, input="The AIE1515_FC_C does not come with an operating system which must be loaded first before installation of any software into the computer.", input_source=kiln_ai.datamodel.DataSource(type=kiln_ai.datamodel.DataSourceType.synthetic, properties={"model_name": "phi4","model_provider":"ollama", "adapter_name": "ollama_adapter"} ), output=kiln_ai.datamodel.TaskOutput( output=json.dumps({"answer_1": "", "question_1": "", "answer_2": "", "question_2": "", "answer_3": "", "question_3": ""}), source=kiln_ai.datamodel.DataSource( type=kiln_ai.datamodel.DataSourceType.synthetic, properties={"model_name": "phi4", "model_provider":"ollama","adapter_name": "ollama_adapter"} ) ), )Expected Outcome:
Request for Help:
Any help or suggestions would be greatly appreciated!