Skip to content

Add files via upload#197

Open
shaoyuheng1 wants to merge 1 commit intoFlagAI-Open:mainfrom
shaoyuheng1:shaoyuheng1-patch-1
Open

Add files via upload#197
shaoyuheng1 wants to merge 1 commit intoFlagAI-Open:mainfrom
shaoyuheng1:shaoyuheng1-patch-1

Conversation

@shaoyuheng1
Copy link
Copy Markdown

No description provided.

Signed-off-by: shaoyuheng1 <shaoyuheng1@outlook.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the data annotation pipeline by introducing advanced example selection strategies, including similarity-based, diversity-based, and contrastive learning approaches. It adds support for task-specific guidance, Chain-of-Thought reasoning, self-consistency voting, and social media sentiment analysis. Additionally, the implementation now supports batch processing and parallel annotation requests. Feedback focuses on improving code portability by removing hardcoded absolute paths, cleaning up formatting artifacts (extraneous backslashes), refining broad exception handling, and reducing duplication in API configurations.

Comment on lines +14 to +15
DATA_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/data'
OUTPUT_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding absolute paths specific to a local environment (/root/flagos/...) makes the code non-portable. It is better to use relative paths or environment variables to define these directories.

Suggested change
DATA_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/data'
OUTPUT_DIR = '/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs'
DATA_DIR = './data'
OUTPUT_DIR = './outputs'

Comment on lines +35 to +38
default='/root/flagos/OpenSeek/openseek/competition/LongContext-ICL-Annotation/outputs/',
help='Prefix path to save the evaluation logs.')
parser.add_argument('--tokenizer_path', type=str,
default='/root/flagos/Qwen3-4B')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The default values for --log_path_prefix and --tokenizer_path are hardcoded to absolute paths. These should be changed to relative paths or made configurable to ensure the script can run in different environments.

\


tokenizer = AutoTokenizer.from_pretrained("/root/flagos/Qwen3-4B", trust_remote_code=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The tokenizer path is hardcoded to an absolute path. This should be passed as a parameter or retrieved from a configuration to maintain portability.

    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B", trust_remote_code=True)

Comment on lines +12 to +14
\
\
\
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are numerous occurrences of backslashes (\) on empty lines throughout this file (e.g., lines 12-14, 29-32, etc.). This appears to be a formatting artifact or error that should be removed to improve code readability.

Comment on lines +38 to +40
except:

tfidf_matrix = vectorizer.transform(texts)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The bare except: block is too broad. It is recommended to catch specific exceptions (e.g., Exception or even better, specific sklearn errors) to avoid catching system signals like KeyboardInterrupt.

    except Exception:
        tfidf_matrix = vectorizer.transform(texts)

Comment on lines +830 to +831
openai.base_url = "http://localhost:9010/v1/"
model = "Qwen3-4B-ascend-flagos"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The API base URL and model name are hardcoded and duplicated in multiple functions (e.g., annotate_ascend and annotate_with_self_consistency). These should be defined as constants or passed as arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant