
The Chinese AI Safety network is a cooperation platform for AI Safety across China that brings together various efforts related to AI Safety and Security at all levels, and that serves as a platform for dialogue, mapping, interoperability, and collaborations, within China, and that connects and contributes to the world for global coordination and cooperation. Current efforts from China that have joined the Chinese AI Safety Network include AI Safety related labs, Institutions, and Industrial efforts from Chinese Academy of Sciences (CAS), China Academy of Information and Communications Technology (CAICT), Peking University (PKU), Tsinghua University (THU), Shanghai AI Lab, Beijing Academy of AI (BAAI), Beijing Institute of AI Safety and Governance (Beijing-AISI), Hongkong Chinese University, Hongkong Lingnan University, China Information Technology Security Evaluation Center, and Center for Long-term AI (CLAI), Alibaba, Ant Group, and SenseTime, etc.

Large Language Models (LLMs) are vulnerable to jailbreak attacks that bypass safety mechanisms. We introduce a scalable attack that preempts safety policies by occupying computational resources. Our method engages the LLM in a resource-intensive task—a Character Map lookup and decoding process—before presenting the target instruction, pre
Large Language Models (LLMs) are vulnerable to jailbreak attacks that bypass safety mechanisms. We introduce a scalable attack that preempts safety policies by occupying computational resources. Our method engages the LLM in a resource-intensive task—a Character Map lookup and decoding process—before presenting the target instruction, preventing safety protocols from activating. Our method provides scalable control over attack strength. These findings reveal that LLM safety mechanisms are vulnerable to resource constraints, highlighting the need for more robust defense strategies.

Introducing "Jailbreak Antidote", a breakthrough method for real-time safety control in large language models (LLMs). Unlike traditional defenses that add computational overhead, our approach adjusts a sparse subset of the model’s internal states during inference, balancing safety and utility without delays. By shifting hidden representat
Introducing "Jailbreak Antidote", a breakthrough method for real-time safety control in large language models (LLMs). Unlike traditional defenses that add computational overhead, our approach adjusts a sparse subset of the model’s internal states during inference, balancing safety and utility without delays. By shifting hidden representations, we achieve effective protection against jailbreak attacks while maintaining model performance. Validated on nine LLMs and ten attack methods, Jailbreak Antidote offers a lightweight, scalable solution for safer, more flexible AI deployments.

All stakeholders involved in the design, development, research, deployment, and use of AI, including policymakers, not only have the obligation to drive progress with cutting-edge AI technologies, but also shoulder the responsibility of anticipating, identifying, and addressing potential risks and safety hazards. This ensures that AI prod
All stakeholders involved in the design, development, research, deployment, and use of AI, including policymakers, not only have the obligation to drive progress with cutting-edge AI technologies, but also shoulder the responsibility of anticipating, identifying, and addressing potential risks and safety hazards. This ensures that AI products and services are safe, secure, reliable, controllable, and governable throughout their entire lifecycle.

"StressPrompt" explores how stress impacts Large Language Models (LLMs) similarly to humans. The study reveals that moderate stress enhances LLM performance, aligning with the Yerkes-Dodson law, while high or low stress impairs it. Stress prompts significantly alter LLMs’ internal states, offering insights into AI resilience and robustness.

We explores different concepts of AI existential risk, connects the enactment of AI red lines to broader efforts addressing AI's impacts, constructs a theoretical framework for analyzing the direct impacts of AI existential risk, and upon that proposes a set of exemplary AI red lines. By contemplating AI existential risks and formulating
We explores different concepts of AI existential risk, connects the enactment of AI red lines to broader efforts addressing AI's impacts, constructs a theoretical framework for analyzing the direct impacts of AI existential risk, and upon that proposes a set of exemplary AI red lines. By contemplating AI existential risks and formulating these red lines, we aim to foster a deeper and systematic understanding of the potential dangers associated with advanced AI and the importance of proactive risk management. Learn more.
Yaodong Yang's team from Peking University has developed Aligner, a model-agnostic, plug-and-play module that can be applied to any powerful, large-scale upstream models to improve their performance on alignment. Aligner has been reported on cover of MIT Tech Review. Currently, Aligner-2B on GPT-4 Turbo ranks 1st place on Alpaca-Eval leaderboard. Learn more about Aligner.
Artificial intelligence (AI) is undoubtedly a driving force for the advancement of society, particularly as one of the key enabling technologies to advance global sustainable development. However, this does not mean that AI does not carry potential risks or that the need to maximize the benefits of AI should lead us to ignore its potenti
Artificial intelligence (AI) is undoubtedly a driving force for the advancement of society, particularly as one of the key enabling technologies to advance global sustainable development. However, this does not mean that AI does not carry potential risks or that the need to maximize the benefits of AI should lead us to ignore its potential risks. Paying attention to and controlling the safety risks of AI is not to hinder its development and applications, but to ensure steady and healthy development of AI.
Safety-Gymnasium is a highly modular, minimally readable and easily customizable benchmark environment library based on MuJoCo to facilitate research in the Safe Reinforcement Learning domain. It is with good support and refactoring of the classic Safe RL environment: Safety-Gym , Safety-Velocity.
Humans often unconsciously perceive social robots involved in their lives as partners rather than mere tools, imbuing them with qualities of companionship. This anthropomorphization can lead to a spectrum of emotional risks, such as deception, disappointment, and reverse manipulation, that existing approaches struggle to address effectiv
Humans often unconsciously perceive social robots involved in their lives as partners rather than mere tools, imbuing them with qualities of companionship. This anthropomorphization can lead to a spectrum of emotional risks, such as deception, disappointment, and reverse manipulation, that existing approaches struggle to address effectively. We argue a Virtual Interactive Environment (VIE) exists between humans and social robots, which plays a crucial role and demands necessary consideration and clarification in order to mitigate potential emotional risks.
The United Nations Office for Disarmament Affairs (UNODA) and the European Commission co-hosted a workshop on "Ethics and Emerging Technologies in Weapons Systems" in April 2022. The director of Center for Long-term AI, Prof. Yi Zeng was invited as a speaker. A recording of his speech here.
This research analysis on Voices from China regarding the “Pause Giant AI Experiments: An Open Letter” aims to reflect the perceptions, attitudes, and voices of people from China regarding “Pause Giant AI Experiments: An Open Letter”. The result shows that more than 90% of the Chinese Participants hold the attitudes towards “support the
This research analysis on Voices from China regarding the “Pause Giant AI Experiments: An Open Letter” aims to reflect the perceptions, attitudes, and voices of people from China regarding “Pause Giant AI Experiments: An Open Letter”. The result shows that more than 90% of the Chinese Participants hold the attitudes towards “support the implementation of ethics, safety and governance framework for every large AI model used in social services”.

Align-Anything aims to align any modality large models (any-to-any models), including LLMs, VLMs, and others, with human intentions and values. Overall, this framework has the following characteristics:  Highly Modular Framework, Support for Various Model Fine-Tuning, Support Fine-Tuning across Any Modality, Support Different Alignment Methods. 
Lean more

The future of Artificial Intelligence is as vast and uncharted as the sea, requiring us humans to navigate forward with great care and caution. We hope the SEA Platform Network will empower AI to be safer and more ethical, enabling better governance of future AI just as humans harness the immense power of the sea. The SEA Platform Network
The future of Artificial Intelligence is as vast and uncharted as the sea, requiring us humans to navigate forward with great care and caution. We hope the SEA Platform Network will empower AI to be safer and more ethical, enabling better governance of future AI just as humans harness the immense power of the sea. The SEA Platform Network is committed to ensuring that AI is safe and ethical, hoping that the SEA Platform Network can be used to ensure their AI systems and services are beneficial to humanity.

Beaver is a highly modular open-source RLHF framework developed by the PKU-Alignment team at Peking University. It aims to provide training data and a reproducible code pipeline for alignment research, especially constrained alignment LLM research via Safe RLHF methods. Support SFT, RLHF and Safe RLHF training for popular pre-trained mod
Beaver is a highly modular open-source RLHF framework developed by the PKU-Alignment team at Peking University. It aims to provide training data and a reproducible code pipeline for alignment research, especially constrained alignment LLM research via Safe RLHF methods. Support SFT, RLHF and Safe RLHF training for popular pre-trained models: LLaMA, OPT, Baichuan, etc. Provide a large human-labeled dataset (up to 1M pairs) including both helpful and harmless preferences to support reproducible RLHF research, and many more.

Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. MultiTrust, the fi
Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. MultiTrust, the first comprehensive and unified benchmark on the trustworthiness of MLLMs across five primary aspects: truthfulness, safety, robustness, fairness, and privacy. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts, encompassing 32 diverse tasks with self-curated datasets.
Dandelion Open Lab for AI Governance includes evaluation on AI safety and security. It is a platform from Shanghai AI Lab. It is built as an AI governance framework that integrates regulation-technology-scenario-evaluation, and is committed to creating a systematic and practical AI ethics and governance infrastructure, serving the implem
Dandelion Open Lab for AI Governance includes evaluation on AI safety and security. It is a platform from Shanghai AI Lab. It is built as an AI governance framework that integrates regulation-technology-scenario-evaluation, and is committed to creating a systematic and practical AI ethics and governance infrastructure, serving the implementation of governance principles, and supporting the improvement of the efficiency of AI governance.
OmniSafe is an infrastructural framework designed to accelerate safe reinforcement learning (RL) research. It provides a comprehensive and reliable benchmark for safe RL algorithms, and also an out-of-box modular toolkit for researchers. Safe RL intends to develop algorithms that minimize the risk of unintended harm or unsafe behavior.
Defense AI and Arms Control Network is a network for enabling expert discussions and idea exchanges in the field of Defense AI and Arms Control, Military AI Ethics, Governance and Global Cooperation, and AI for Peace. It also provides online services to synthesize global discussions on these topics from complementary perspectives, and provide a global view to promote AI for peace.
The Multi Modal Big Model Safety Norms Series is led by China Academy of Information and Communication Technology (CAICT), with participation from the Ant Group, Baidu, Huawei, University of Chinese Academy of Sciences, China Telecom, etc. Major contents of this norm series include but not limited to value alignment, avoiding misinformati
The Multi Modal Big Model Safety Norms Series is led by China Academy of Information and Communication Technology (CAICT), with participation from the Ant Group, Baidu, Huawei, University of Chinese Academy of Sciences, China Telecom, etc. Major contents of this norm series include but not limited to value alignment, avoiding misinformation and disinformation, as well as other safety issues. The Norm is in draft phase, and two drafting working meetings have been done.
The Big Model System Safety Protection Requirement standard is a Community standard led by Cybersecurity Classified Protection Evaluation Center at Ministry of Public Security. The content include but not limited to Big Model Safety and Security frameworks, especially on general safety and security, full life cycle safety and security, etc.
The Big Model System Safety Evaluation Rquirement standard is a Community standard led by Cybersecurity Classified Protection Evaluation Center at Ministry of Public Security. The content include but not limited to evaluation method, especially data cleaning, leak scanning, attack evaluation, content security, robustness evaluation, etc.
The AI Safety Guard Initiative is co-established by Institute of AI at CAICT and the Safety and Governance Committee of the AI Industry Alliance (AIIA) of China. Major activities of the initiative is to share AI Safety and Security related information, incidents, leaks. Safety and security risks will be evaluated and reported to relevant governance ministries in China.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.