CARE Publications

Explore our research publications across various AI safety domains and conferences.

All Publications

4 publications

Latest
Oldest
A-Z
Z-A

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, Zhun Wang, Zhuowen Yuan, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song
ICLR
2025
Data/Computing Social Good Risk Assessment

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
ICLR
2025
Data/Computing Law/Policy Social Good Risk Assessment

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Zhaorun Chen, Francesco Pinto, Minzhou Pan, Bo Li
ICLR
2025
Data/Computing Algorithms Law/Policy Social Good Safety Enhancement Risk Assessment

R2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Mintong Kang, Bo Li
ICLR
2025
Algorithms Law/Policy Social Good

AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

Mintong Kang, Chejian Xu, Bo Li
ICLR
2025
Algorithms Risk Assessment

On Memorization of Large Language Models in Logical Reasoning

Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar
arXiv
2025
Risk Assessment Science Data/Computing

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li
NeurIPS
2024
Algorithms Risk Assessment

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li
ICML
2024
Algorithms Theoretical Guarantees

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou
ICML
2024
Algorithms Theoretical Guarantees Safety Enhancement

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li
NeurIPS
2023
Data/Computing Social Good Risk Assessment