Research and Collaborative Projects
Through open innovation and collaboration, the Data Science Institute bridges the gap between specialized research tools and scientific advancements. Whether researchers need help implementing AI models, refining analytical methods, or exploring new data science applications, the Data Science Institute experts offer insights and technical support to keep researchers at the forefront of data-driven discovery. The Data Science Institute makes use of many common research tools to help researchers integrate advanced AI technologies and advanced research software into their collaborative projects including:
To schedule a consultation to discuss how the Data Science Institute can help with your current or planned project or for more information, email the experts at rii-datascienceinstitute@arizona.edu.
An open-source, in-house AI pipeline created by the Data Science Institute at the University of Arizona, AI Verde is designed to act as a one-stop shop for all research and teaching needs of a university campus, with a focus on accuracy, privacy preservation, and intellectual property protection. AI Verde meets the users where they are by providing training workshops starting from programming workshops through advanced AI.
PI and/or campus faculty contact: Nirav Merchant, Mithun Paul, Enrique Noriega, Edwin Skidmore
This project aims to transform agriculture by developing advanced Cyber-Physical Systems (CPS) that enable farmers to address crop stressors with lower costs, improved agility, and reduced environmental impact, using AI, machine learning, and robotics for individual plant-level plant-level sensing, modeling and reasoning. The research focuses on data-driven estimation, robust machine learning, and autonomous robot coordination, while also promoting outreach to diverse communities and making research outcomes publicly accessible. CyVerse is a collaborative partner on COALESCE (NSF award #1954556)
Co-PI: Nirav Merchant
The AI Institute for Resilient Agriculture (AIIRA) aims to harness artificial intelligence (AI) and digital twin technology to address the growing challenges of food security by improving crop resilience, boosting yields, and enhancing sustainable farming practices. By bringing together experts from various fields, AIIRA seeks to revolutionize agriculture, helping farmers, governments, and businesses adapt to climate variability while driving economic development in rural areas. CyVerse is a collaborative partner on AIIRA (NSF and USDA NIFA award #2021-67021-35329)
Co-PI: Nirav Merchant
AEOLUS will leverage cloud-native services and state-of-the-art computational resources including ACCESS & CyVerse, over Internet2 in order to allow researchers at universities, in governmental research labs, and industry to work with sUAS data in cloud-native environments, thus enabling data intensive science.
PI and/or campus faculty contact: Dr. Tyson Swetnam
This study involves repeated measurements of neurocognitive assessments, and neuroimaging. Our specific aims are to 1) create a multiscale model of cognitive fatigue and 2) develop novel mathematical tools that enables user-friendly programs to assess circadian misalignment and cognitive fatigue.
PI and/or campus faculty contact: Dr. William Killgore
XNAT
This project aims to address the increasing unpredictability of extreme water-related events like floods, droughts, and wildfires, which have caused significant damages in recent years. HydroGEN, a web-based machine learning platform, allows water managers to create and explore custom hydrologic scenarios without prior modeling experience, helping them navigate the uncertainty of future risks. CyVerse is a collaborative partner on HydroGEN (NSF award #2134892).
Co-PI: Nirav Merchant
To develop 1) a clinically relevant classification system that integrates clinical, radiographic, genomics, histopathology, and spatial molecular data and 2) PITMAP (PITuitary Molecular Atlas Project), a publicly available spatial molecular atlas for PitNET that will enable investigators to cross-reference the spatial tissue imaging and genetic data with clinical information to develop a next generation WHO classification of PitNETs.
PI and/or campus faculty contact: Dr. Yana Zavros
For the project, Individual Molecular Registry of Patients for Accelerated Clinical and Translational Medicine (Soteria), the goal is to better understand the underlying relationships driving various manifestations of cancer. We will do this by analyzing multiple data types provided by Caris Biosciences (ordered by UA Banner Health clinical oncologists). Thus, we need to automate PHI and PII data extraction, anonymization, and subsequent storage from clinical files provided by Caris Biosciences, particularly as the dataset grows.
PI and/or campus faculty contact: Dr. Ritu Pandey
Jetstream2 is a cloud-based cyberinfrastructure system designed to support diverse, on-demand computational needs, leveraging cutting-edge technology to enhance research and educational outcomes. Led by Indiana University and partnering with multiple institutions, including CyVerse, it will advance the national cyberinfrastructure ecosystem by providing accessible AI capabilities and training, while empowering students and researchers across disciplines to participate in the evolving STEM workforce. Jetstream2 (NSF award #2005506)
Co-PI: Nirav Merchant
XNAT will support new investigators in the collection, evaluation, processing and analysis of MRI data. This project will help introduce investigators to XNAT and get them started on their MRI projects. Autonomous projects will be rolled out to become full XNAT projects as investigators gain familiarity with the system.
PI and/or campus faculty contact: Theodore Trouard
The Kidney Donor Data Pilot (Soteria) focus will be on four data points – kidney anatomy, procurement biopsy, machine perfusion, donation after circulatory death [DCD] organ recovery data to be used in building the data set spanning multiple years of archived files. Automatic identification and extraction of data from deceased donor PDF attachments, not attainable with current commercial or open-source products. Building a structured, analyzable dataset from DonorNet’s PDF attachments, containing donor post-recovery type data (anatomy/ biopsy/ PMP/ DCD) playing a critical role in kidney utilization.
PI and/or campus faculty contact: Dr. Bekir Tanriover
Our current system archives all PHI and sensor data onto MDH beyond the scope of this project. To provide a true repository of sensor data for teaching, future research and grant applications, we would need a secure resource to archive PHI + sensor data that is searchable and cataloged.
PI and/or campus faculty contact: Dr. Shravan Aras
Fatty infiltration has been shown to be a poor prognostic factor for repair of the cuff tendons as well as a marker for recurrent tears. This pilot project investigates the correlation of muscle fat infiltration on the surgical repair of rotator cuff tear outcome. Radiologists as part of this IRB-approved project will be able to gain access to the images on the XNAT and analyze them using the OHIF plugin.
PI and/or campus faculty contact: Spencer Knight
In Arizona, the incidence of melanoma and non-melanoma skin cancer is higher than average US rates. With the increased rate of skin cancers and the high costs of treatment, more accurate and affordable minimally invasive diagnostic approaches are needed. The opportunity is present to create a unique and robust repository that includes matched imaging and tissue to develop novel multimodal imaging and tissue analytics with the development of minimally invasive technology. The development of a statewide database has the potential to shift current skin cancer diagnostic practice.
PI and/or campus faculty contact: Dr. Clara Curiel
OMERO
Space4 and the Data Science Institute will develop the Space4 Platform in order to automate and efficiently track objects of interests in support of the University of Arizona Space Domain Awareness Program efforts using current technologies.
PI and/or campus faculty contact: Dr. Roberto Furfaro
Develop a user interface (UI) that displays and allows the search of output of a large-scale information extraction system designed for protein-protein interactions related to c.difficile. Clostridioides difficile is an important pathogen of both human and veterinary populations, and this pathogen is being actively studied by medical and agricultural researchers.
PI and/or campus faculty contact: Dr. Fiona McCarthy
The PhytoOracle team generates JSON-formatted index files for these data sets. Using these index files, we generate a new indexable file in JSON-format and upload them to the OpenSearch server so that these new files can be indexed and searched for information. OpenSearch is a search engine that provides advanced search capabilities and data analytics. The indexed files are also used to create OpenSearch dashboards.
PI and/or campus faculty contact: Duke Pauli
MDRepo is built on top of the CyVerse Data Store, and is thus capable of supporting petabyte-scale storage, high throughput upload/download, data redundancy, and cloud computing access. In addition to providing a centralized location for storage and access of MD simulations, MDRepo lays the foundation for the next generation of ML methods for inference of structure, dynamics, and interactions. We will expand the functionality and utility of MDRepo, enabling a new wave of AI methods that improve understanding of biomolecule dynamics and interactions.
PI and/or campus faculty contact: Travis Wheeler
Initiative to facilitate the development of environmental science.
PI and/or campus faculty contact: Jennifer Balch (CU Boulder) & Dr. Tyson Swetnam
Researching the use of LLM agents in order to find niche areas to help our researcher colleagues. Example of this is how to do text2sql, how to generate structured summaries of collections of documents, among others. The initial goal is to have prototypes ready to leverage in the search for funding sources. Specific deliverable are a series of code repositories and technical reports. We want to leverage AI VERDE LLM in order to power agents.
PI and/or campus faculty contact: Nirav Merchant & Enrique Noriega
Cross reference and link 2-3 large tables of patient/student records using closest match (name/dob/etc). Deliverable is a single combined data table. (working in a very secure environment with PHI dataset). A remote secured windows system (provided by Dept of Edu) with no internet access besides my VPN login.
PI and/or campus faculty contact: Dr. Frost and Hagan Franks
The project aims to create a robust connected transportation system where infrastructure operators gather and process real-time data from various sources (like sensors, traffic signals, and data services) and then share this information with vehicles and travelers through V2X communications. In essence, the goal is to ensure that critical traffic and roadway information—ranging from incident alerts and work zone updates to signal timing and roadway layouts—is delivered accurately, reliably, and timely to enhance overall traffic management and safety.
PI and/or campus faculty contact: Dr. Larry Head with Edwin Skidmore and Hagan Franks
ATSM is an automated system to deliver PRO-CTCAE surveys to oncology patients as part of an ongoing study with UA College of Nursing and other university oncology programs.
PI and/or campus faculty contact: Angela Young, Molly Hadeed, Alla Sikorskii
Develop leaf segmentation pipeline using CyVerse infrastructure and write publication.
PI and/or campus faculty contact: Aikseng Ooi, Nirav Merchant, and Michelle Yung
During the summer of 2024, CyVerse sponsored a KEYS project for the high school student, Tanmay Dewangan. This project demonstrated the viability of implementing the CyVerse Data Commons using CKAN. The Data Commons v3 project will turn this proof of concept into a minimum viable product (MVP). The MVP will recreate the current Data Commons with CKAN. It will also provide a means for authorized projects to migrate their data to Data Commons and curate it, reducing the burden on CyVerse and DSI personnel.
PI and/or campus faculty contact: NIrav Merchant and Tony Edgin
The iRODS rule queue monitoring service ran as a Jenkins job. The Jenkins hosting it was decommissioned, so we need to re-establish this service using Prometheus.
PI and/or campus faculty contact: Tony Edgin
Develop and document a plan to recover the CyVerse Data Store in the case of total loss of the Data Store hardware located at the U of A. This should not include project specific storage resources.
PI and/or campus faculty contact: Tony Edgin
Upgrade our iRODS grids to 4.3.1. Also make it possible for CyVerse Austria and RFHS to upgrade their grids using our playbooks.
PI and/or campus faculty contact: Tony Edgin
Demonstrate the viability of allowing CyVerse users to store data in an S3 bucket host by a cloud provider such as Amazon while maintaining full access to the data through the Data Store.
PI and/or campus faculty contact: Tony Edgin
Minimize the Data Store downtime required to switch to the ICAT DB replica server when the master fails.
PI and/or campus faculty contact: Tony Edgin
U of A and UA Health Sciences 5.3 Strategic Initiative for a health analytics powerhouse using research data science tools and technologies in five major research areas: Data, Analytics, Community Services, Emerging Technologies, and Education and Training.
PI and/or campus faculty contact: Nirav Merchant and Maliaca Oxnam