The Imageomics Institute is hosting a 3.5-day workshop offering a unique opportunity to develop proof-of-concepts demonstrating how the application of computer vision techniques can transform biodiversity data collection. The workshop will leverage one of the nation’s premier monitoring networks, the US National Science Foundation’s National Ecological Observatory Network (NEON). The treasure trove of data products collected at NEON sites provides an opportunity to push multi-modal model development applied to computer vision in tackling this challenge. The event will bring together an interdisciplinary group around a shared interest of using AI/ML to extract scientific knowledge from imagery data, including ML researchers, ecologists, beetle systematists, software developers, and data engineers. Participants will work in small groups to collaboratively curate or develop FAIR data products, best practices, tools, and other products targeting the motivating challenge.
The event will take place August 12-15 at The Ohio State University in Columbus, OH, USA. To apply to participate, please fill out the BeetlePalooza 2024 Application for Participation by the end of June 10, 2024. Funds to assist with travel expenses are available but limited, as is space. We expect to notify applicants about acceptance starting 10 days after the application due date.
BeetlePalooza 2024 will bring together an interdisciplinary group interested in using AI/ML to extract scientific knowledge from image data. This event offers a hands-on, collaborative experience. It is not intended to be a competitive hackathon or a conference. We expect this to include AI/ML researchers, data scientists, domain scientists, data curators, tool developers, metadata researchers, and knowledge engineers. Participants will self-organize into small groups to work hands-on and collaboratively on self-selected targets and outcomes towards the motivations and goals of the event. The process of self-organizing and choosing work targets will be facilitated, but every participant will play an equally active role in making the event a successful, rewarding experience. We aim for an event that will give everyone ample opportunities to contribute their skills and experience, acquire new knowledge, increase technological awareness, and find potential new collaborators. Although the event is primarily designed to create work outcomes, the format will leave room for participant-driven exchange of know-how and skills.
The National Ecological Observatory Network (NEON) collects an unprecedented multitude of ecological and environmental data on a continental scale through field sampling and remote sensing. As part of the field sampling, NEON collects, counts, and identifies biological specimens of environmental indicator species and those filling ecologically important roles. The underlying processing of specimens is often manual and time consuming and is limited to taxonomic identification and counts. This presents a unique opportunity with potentially long-lasting impact to explore the potential and limitations of AI/ML-driven automation for biodiversity data collection efforts, including by developing and utilizing multi-modal ML computer vision models that take advantage of imagery, remote sensing, and environmental data.
One taxonomic group especially ripe for proof-of-concept is beetles, one of the world's most diverse groups that serve important roles in pollination of plants and as indicator species providing early warning signals of environmental change. NEON collects ground beetles (Carabidae) from across the continental United States, Puerto Rico, and Alaska using pitfall traps. These specimens are then sorted and identified by NEON staff and other taxonomic experts in a painstaking, manual process that can take over a year and concludes with publishing counts of beetle species on NEON’s data portal. What if, rather than publishing counts of species, NEON captured and published images of beetles? Can we develop an automated process to more efficiently derive species counts from the imagery? Is it possible to use imagery to measure important characteristics (known as functional traits, such as body size) of the different beetles that are collected? This workshop offers an opportunity to develop a proof-of-concept to demonstrate how the application of computer vision techniques could transform how ground beetle community data are collected, and thus biodiversity data more generally. Moreover, the treasure trove of data products collected at NEON sites provides the opportunity to push multi-modal model development applied to computer vision in tackling this challenge.
We aim to facilitate outcomes that address the potential of and need for ML-ready biological image datasets to extract information about biodiversity, including (but not limited to!) the following:
- A prototype workflow for extracting trait and species identifications from images of NEON beetle specimens. Our goal is for this workflow to be reproducible, follow FAIR guiding principles, and understandable by biologists and computer scientists.
- Publication of open data products containing species identifications and functional traits derived from beetle specimen images.
- A peer-reviewed publication describing best practices for AI/ML ready biological specimen data and images. This information will be accompanied by a white paper to be presented to NEON with advice for how to move forward with efforts to make the Observatory’s data more AI/ML ready.
We are keeping the scope of possible projects focused on the extraction of species identifications and trait measurements from beetles to maximize the limited time we have in the workshop. That notwithstanding, we expect the event to connect people with domain science-focused goals, such as biologists interested in datasets that help answer biological questions, to people with ML-focused goals, such as ML researchers interested in domain science questions for which to develop algorithms and models.
We generally expect datasets curated at or for the event, as well as tools or methods developed, to satisfy FAIR principles, and where applicable also CARE principles.
The event will be held August 12-15, 2024, at the Imageomics Institute’s headquarters at The Ohio State University, Pomerene Hall, in Columbus, OH.
We aim to bring together a diverse group of people that includes AI and ML researchers and practitioners, as well as ecologists, taxonomists, and related domain scientists, software developers, and data engineers. Members of organizations in the US National Science Foundation (NSF) funded Harnessing the Data Revolution (HDR) ecosystem are particularly encouraged to consider applying, especially members of their respective computer science, computer vision, and machine learning communities.
In general, people encouraged to consider applying include (but are not limited to!) the following:
- AI/ML experts and researchers, particularly those in computer vision (CV), interested in collaboratively advancing tools, infrastructure, and data products.
- Ecologists (and related domain scientists such as biodiversity and environmental scientists) who are interdisciplinary-minded and ideally have already used or are planning to use NEON data in their research, and/or have some familiarity with applying ML and computer vision for ecological research questions.
- Taxonomists or systematists with relevant expertise (such as Carabidae systematics and species identification) and familiarity with Big Data-related challenges.
- Software engineers or programmers with skills requisite for AI/ML (Python and applicable libraries, etc), data wrangling/management (SQL, Pandas, R, etc) who are interested in developing automated ML workflows, tools and infrastructure.
- Data engineers with experience and expertise in creating FAIR data products suitable for AI/ML applications
- Graduate students and postdocs looking for an opportunity to develop their skills in interdisciplinary research at the intersection of AI/ML and domain science (ecology, biodiversity and environmental science).
- Advanced undergraduates in computer science (ML / CV), math, or data analytics with demonstrated interest in interdisciplinary research
Everyone participating in the event must adhere to its Code of Conduct.
The Imageomics Institute is funded by the US National Science Foundation (NSF) within its Harnessing the Data Revolution (HDR) Institute program. Its vision is to create a collaborative research, training, and community-facing environment for extracting known and discovering new biological traits from images, with the necessary infrastructure for cyber, information, and model development. The Institute will advance Imageomics-enabled biology, accelerate innovations in machine learning, and create digital resources for the researchers and practitioners in biology, data science, and machine learning, as well as the broader scientific community. It will further interdisciplinary training and education, and engage the broader public in the scientific process.
The NFS’s National Ecological Observatory Network (NEON) is a continental-scale observation facility operated by Batelle and designed to collect long-term open access ecological data to better understand how United States’ ecosystems are changing. NEON collects data and specimens using an extensive network of thousands of automated instruments and hundreds of field technicians, as well as through airborne remote sensing, at 81 field sites located across 20 ecoclimatic domains across the United States, including 47 terrestrial and 34 freshwater aquatic field sites.
Hilmar Lapp (Duke University & Imageomics Institute)
Michelle Ramirez (The Ohio State University & Imageomics Institute)
Sydne Record (University of Maine)
Eric Sokol (National Ecological Observatory Network, Battelle)