Dddug patch 2 by Dddug · Pull Request #192 · FlagAI-Open/OpenSeek

Dddug · 2026-04-16T16:51:16Z

No description provided.

Signed-off-by: Dddug <xxx@email.com>

gemini-code-assist

Code Review

This pull request significantly expands the data annotation framework by introducing advanced example selection strategies (similarity, diversity, and quality-based), task-specific guidance, and robust annotation methods like batch processing and self-consistency. It also integrates social media sentiment analysis and Chain-of-Thought (CoT) reasoning. Feedback focuses on improving portability by avoiding hardcoded absolute paths, correcting mismatched type hints in count_answer and annotate_batch, and removing redundant code such as unnecessary try-except blocks, duplicate imports, and unused variables.

gemini-code-assist · 2026-04-16T16:52:58Z

+DATA_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/data'
+OUTPUT_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs'


Using hardcoded absolute paths makes the code non-portable and dependent on a specific environment structure. It is better to use relative paths or environment variables to define data and output directories.

DATA_DIR = './data'\nOUTPUT_DIR = './outputs'

gemini-code-assist · 2026-04-16T16:52:58Z

+    try:
+
+        tfidf_matrix = vectorizer.fit_transform(texts)
+    except:
+
+        tfidf_matrix = vectorizer.transform(texts)


The try-except block here is redundant and potentially misleading. fit_transform will always fit the vectorizer to the provided text if the input is valid, making the except block unreachable for its intended purpose (falling back to transform if already fitted). Since this function is intended to compute similarity for a specific set of texts, fit_transform is sufficient.

tfidf_matrix = vectorizer.fit_transform(texts)

gemini-code-assist · 2026-04-16T16:52:59Z

+        score *= 0.7
+
+
+    digit_ratio = sum(1 for c in input_text if c.isdigit()) / max(input_len, 1)


The variable digit_ratio is calculated but never used to adjust the quality score. If this was intended to be part of the quality heuristic, it should be applied to the score variable; otherwise, it should be removed.

gemini-code-assist · 2026-04-16T16:52:59Z

+
+
+
+def count_answer(text: str) -> tuple[list, dict]:


The type hint tuple[list, dict] for count_answer does not match the implementation, which returns a single string (or None). Additionally, the docstring's return description is inconsistent with the actual return value.

Suggested change

def count_answer(text: str) -> tuple[list, dict]:

def count_answer(text: str) -> str | None:

gemini-code-assist · 2026-04-16T16:52:59Z

+    return prediction, whole_result
+
+
+def annotate_batch(prompts: list[str], num_workers: int = 4, max_tokens: int = 128, use_count_answer: bool = True, task_id: int = None) -> list[str]:


The return type hint list[str] is incorrect. The function returns a list of tuples containing the prediction and the whole result from annotate_ascend.

Suggested change

def annotate_batch(prompts: list[str], num_workers: int = 4, max_tokens: int = 128, use_count_answer: bool = True, task_id: int = None) -> list[str]:

def annotate_batch(prompts: list[str], num_workers: int = 4, max_tokens: int = 128, use_count_answer: bool = True, task_id: int = None) -> list[tuple[str | None, str | None]]:

gemini-code-assist · 2026-04-16T16:52:59Z

+
+
+
+    import re


The import re statement inside extract_social_media_features is redundant as re is already imported at the top of the file (line 1).

Dddug added 2 commits April 17, 2026 00:49

Add files via upload

0fb28ec

Add files via upload

3eff16c

Signed-off-by: Dddug <xxx@email.com>

gemini-code-assist bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dddug patch 2#192

Dddug patch 2#192
Dddug wants to merge 2 commits intoFlagAI-Open:mainfrom
Dddug:Dddug-patch-2

Dddug commented Apr 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		DATA_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/data'
		OUTPUT_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs'

		score *= 0.7


		digit_ratio = sum(1 for c in input_text if c.isdigit()) / max(input_len, 1)

	def count_answer(text: str) -> tuple[list, dict]:
	def count_answer(text: str) -> str \| None:

		return prediction, whole_result


		def annotate_batch(prompts: list[str], num_workers: int = 4, max_tokens: int = 128, use_count_answer: bool = True, task_id: int = None) -> list[str]:

Conversation

Dddug commented Apr 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant