Skip to content

Commit e1d2fd5

Browse files
authored
Merge pull request #987 from jeanbez/main
include two LBNL projects for OSRE26
2 parents 0ac5916 + 5932750 commit e1d2fd5

6 files changed

Lines changed: 45 additions & 5 deletions

File tree

content/authors/jeanlucabez/_index.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,14 @@ role: "Research Scientist, Lawrence Berkeley National Laboratory"
1515

1616
# Organizations/Affiliations
1717
organizations:
18-
- name: Scientific Data Management Research
19-
url: "https://crd.lbl.gov/divisions/scidata/sdm"
2018
- name: Computing Sciences Research Division
21-
url: "https://crd.lbl.gov"
19+
url: "https://cs.lbl.gov"
2220
- name: Lawrence Berkeley National Laboratory
2321
url: "https://www.lbl.gov"
2422

2523

2624
# Short bio (displayed in user profile at end of posts)
27-
bio: Jean Luca's research interests are in high-performance computing + I/O + storage.
25+
bio: Jean Luca is a Career-Track Research Scientist at Lawrence Berkeley National Laboratory (LBNL), USA. Jean Luca's research interests are in High Performance Computing (HPC), data management, I/O, storage, and AI data readiness.
2826

2927

3028

@@ -35,7 +33,7 @@ bio: Jean Luca's research interests are in high-performance computing + I/O + st
3533
social:
3634
- icon: home
3735
icon_pack: fas
38-
link: https://crd.lbl.gov/divisions/scidata/sdm/staff/jean-luca-bez/
36+
link: https://profiles.lbl.gov/148621-jean-luca-bez
3937
- icon: github
4038
icon_pack: fab
4139
link: https://github.com/jeanbez
-266 KB
Loading
37 KB
Loading
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
title: "AI Data Readiness Inspector (AIDRIN)"
3+
authors: [jeanlucabez, surenbyna]
4+
author_notes: ["Lawrence Berkeley National Laboratory", "The Ohio State University (OSU)"]
5+
tags: ["osre26", "uc", "LBNL", "data science", "AI"]
6+
date: 2026-01-30T10:15:00-07:00
7+
lastmod: 2026-01-30T10:15:00-07:00
8+
---
9+
10+
Garbage In, Garbage Out (GIGO) is a widely accepted quote in computer science across various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest considerable time and effort in preparing the data for AI.
11+
12+
[AIDRIN](https://arxiv.org/pdf/2406.19256) (AI Data Readiness INspector) is a framework that provides a quantifiable assessment of data readiness for AI processes, covering a broad range of dimensions from the literature. AIDRIN uses metrics from traditional data quality assessment, such as completeness, outliers, and duplicates, to evaluate data. Furthermore, AIDRIN uses metrics specific to assessing AI data, such as feature importance, feature correlations, class imbalance, fairness, privacy, and compliance with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles. AIDRIN provides visualizations and reports to assist data scientists in further investigating data readiness.
13+
14+
### AIDRIN Multiple File Formats
15+
16+
The proposed work will include improvements in the AIDRIN framework to (1) add support for new file formats such as Zarr, ROOT, and HDF5; and (2) to allow providing custom data ingestion mechanisms.
17+
18+
- **Topics:** `data readiness`, `AI`, `data analysis`
19+
- **Skills:** Python, C/C++, data analysis, good communicator
20+
- **Difficulty:** Moderate
21+
- **Size:** Large (350 hours)
22+
- **Mentors:** {{% mention jeanlucabez %}} and {{% mention surenbyna %}}
58.2 KB
Loading
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: "Drishti"
3+
authors: [jeanlucabez, "Suren Byna"]
4+
author_notes: ["Lawrence Berkeley National Laboratory", "The Ohio State University (OSU)"]
5+
tags: ["osre26", "uc", "LBNL", "data science", "visualization", "profiling", "tracing"]
6+
date: 2026-01-30T10:15:00-07:00
7+
lastmod: 2026-01-30T10:15:00-07:00
8+
---
9+
10+
[Drishti](https://github.com/hpc-io/drishti) is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications. Drishti aims to fill the gap between the trace collection, analysis, and tuning phases. The framework contains an interactive I/O trace analysis component for end-users to visually inspect their applications' I/O behavior, focusing on areas of interest and getting a clear picture of common root causes of I/O performance bottlenecks. Based on the automatic detection of I/O performance bottlenecks, our framework maps numerous common and well-known bottlenecks and their solution recommendations that can be implemented by users.
11+
12+
### Drishti Comparisons and Heatmaps
13+
14+
The proposed work will include investigating and building a solution to allow comparing and finding differences between two I/O trace files (similar to a `diff`), covering the analysis and visualization components. It will also explore additional metrics and counters such as Darshan heatmaps in the analysis and visualization components of the framework.
15+
16+
- **Topics:** `I/O`, `HPC`, `data analysis`, `visualization`, `profiling`, `tracing`
17+
- **Skills:** Python, data analysis, performance profiling
18+
- **Difficulty:** Moderate
19+
- **Size:** Large (350 hours)
20+
- **Mentors:** {{% mention jeanlucabez %}} and [Suren Byna](mailto:sbyna@lbl.gov)

0 commit comments

Comments
 (0)