submit by peyiran · Pull Request #194 · FlagAI-Open/OpenSeek

peyiran · 2026-04-19T09:45:30Z

Signed-off-by: peyiran peyiran@proton.me

gemini-code-assist

Code Review

This pull request significantly enhances the data annotation framework by introducing advanced example selection strategies (similarity, diversity, and quality-based), task-specific guidance, and specialized workflows like self-consistency and multi-turn dialogs. It also implements batch processing for improved efficiency. The review feedback highlights several improvement opportunities, including replacing hardcoded absolute paths with relative ones to ensure portability, optimizing tokenizer initialization, correcting function type hints, and increasing the flexibility of the batch annotation utility.

gemini-code-assist · 2026-04-19T09:47:01Z

+DATA_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/data'
+OUTPUT_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs'


The paths DATA_DIR and OUTPUT_DIR are hardcoded to absolute paths on a specific machine (/root/flagos/...). This makes the script non-portable and likely to fail in other environments. Consider using relative paths or environment variables to define these directories.

gemini-code-assist · 2026-04-19T09:47:01Z

+    parser.add_argument('--max_input_length', type=int, default=10_000,
+                        help='Maximum input length for the model.')
+    parser.add_argument('--log_path_prefix', type=str, 
+                        default='/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs/',


The default value for --log_path_prefix is a hardcoded absolute path. It is better to use a relative path like './outputs/' to ensure the script works across different environments without modification.

Suggested change

default='/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs/',

default='./outputs/',

gemini-code-assist · 2026-04-19T09:47:02Z

+                        default='/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs/',
+                        help='Prefix path to save the evaluation logs.')
+    parser.add_argument('--tokenizer_path', type=str,
+                        default='/root/flagos/Qwen3-4B')


The default value for --tokenizer_path is a hardcoded absolute path. Consider using a more generic default or requiring the user to provide this path via the command line.

gemini-code-assist · 2026-04-19T09:47:02Z

+
+
+
+
+
+
+    lines = text.strip().split('\n')


There are several unnecessary empty lines at the beginning of the parse_category_and_clue function. Removing them would improve code readability and maintainability.

Suggested change

lines = text.strip().split('\n')

lines = text.strip().split('\n')

gemini-code-assist · 2026-04-19T09:47:02Z

+
+
+
+    tokenizer = AutoTokenizer.from_pretrained("/root/flagos/Qwen3-4B", trust_remote_code=True)


The tokenizer is being re-initialized from a hardcoded absolute path every time select_examples is called. This is inefficient and makes the code dependent on a specific file system structure. Consider passing the tokenizer as an argument to the function or using a global cache to store the initialized tokenizer.

gemini-code-assist · 2026-04-19T09:47:02Z

+
+
+
+def count_answer(text: str) -> tuple[list, dict]:


The type hint for count_answer is incorrect. It specifies tuple[list, dict], but the function implementation returns a string (str) or None.

Suggested change

def count_answer(text: str) -> tuple[list, dict]:

def count_answer(text: str) -> str | None:

gemini-code-assist · 2026-04-19T09:47:02Z

+    return cleaned_text
+
+
+def annotate_nvidia(input_prompt:str)->list[str]:


The type hint for annotate_nvidia is incorrect. It specifies list[str], but it returns the result of count_answer, which is a string or None.

Suggested change

def annotate_nvidia(input_prompt:str)->list[str]:

def annotate_nvidia(input_prompt:str) -> str | None:

gemini-code-assist · 2026-04-19T09:47:02Z

+    results = [None] * len(prompts)
+
+    with ThreadPoolExecutor(max_workers=num_workers) as executor:
+        future_to_idx = {executor.submit(annotate_ascend, p, max_tokens, use_count_answer, task_id): i for i, p in enumerate(prompts)}


annotate_batch is hardcoded to use annotate_ascend. This prevents the use of other annotation methods (such as annotate_nvidia) when batch processing is required. Consider passing the annotation function as a parameter to make this utility more flexible.

submit

66a2469

Signed-off-by: peyiran peyiran@proton.me

gemini-code-assist bot reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submit#194

submit#194
peyiran wants to merge 1 commit intoFlagAI-Open:mainfrom
peyiran:peyiran-patch-1

peyiran commented Apr 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

gemini-code-assist bot Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		DATA_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/data'
		OUTPUT_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs'

	default='/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs/',
	default='./outputs/',







	lines = text.strip().split('\n')
	lines = text.strip().split('\n')




		tokenizer = AutoTokenizer.from_pretrained("/root/flagos/Qwen3-4B", trust_remote_code=True)

	def count_answer(text: str) -> tuple[list, dict]:
	def count_answer(text: str) -> str \| None:

		return cleaned_text


		def annotate_nvidia(input_prompt:str)->list[str]:

	def annotate_nvidia(input_prompt:str)->list[str]:
	def annotate_nvidia(input_prompt:str) -> str \| None:

Conversation

peyiran commented Apr 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant