• CRYPTO-GRAM, September 15, 202 Part 4

    From Sean Rima@21:1/229.1 to All on Tue Oct 1 21:52:08 2024

    ** *** ***** ******* *********** *************
    Evaluating the Effectiveness of Reward Modeling of Generative AI Systems

    [2024.09.11] New research evaluating the effectiveness of reward modeling during Reinforcement Learning from Human Feedback (RLHF): “SEAL:
    Systematic Error Analysis for Value ALignment.” The paper introduces quantitative metrics for evaluating the effectiveness of modeling and
    aligning human values:

    Abstract: Reinforcement Learning from Human Feedback (RLHF) aims to
    align language models (LMs) with human values by training reward models
    (RMs) on binary preferences and using these RMs to fine-tune the base LMs. Despite its importance, the internal mechanisms of RLHF remain poorly understood. This paper introduces new metrics to evaluate the
    effectiveness of modeling and aligning human values, namely feature
    imprint, alignment resistance and alignment robustness. We categorize alignment datasets into target features (desired values) and spoiler
    features (undesired concepts). By regressing RM scores against these
    features, we quantify the extent to which RMs reward them a metric we term feature imprint. We define alignment resistance as the proportion of the preference dataset where RMs fail to match human preferences, and we
    assess alignment robustness by analyzing RM responses to perturbed inputs.
    Our experiments, utilizing open-source components like the Anthropic preference dataset and OpenAssistant RMs, reveal significant imprints of target features and a notable sensitivity to spoiler features. We observed
    a 26% incidence of alignment resistance in portions of the dataset where LM-labelers disagreed with human preferences. Furthermore, we find that misalignment often arises from ambiguous entries within the alignment
    dataset. These findings underscore the importance of scrutinizing both RMs
    and alignment datasets for a deeper understanding of value alignment.

    ** *** ***** ******* *********** *************
    Microsoft Is Adding New Cryptography Algorithms

    [2024.09.12] Microsoft is updating SymCrypt, its core cryptographic
    library, with new quantum-secure algorithms. Microsoft’s details are here. From a news article:

    The first new algorithm Microsoft added to SymCrypt is called ML-KEM. Previously known as CRYSTALS-Kyber, ML-KEM is one of three post-quantum standards formalized last month by the National Institute of Standards and Technology (NIST). The KEM in the new name is short for key encapsulation. KEMs can be used by two parties to negotiate a shared secret over a public channel. Shared secrets generated by a KEM can then be used with
    symmetric-key cryptographic operations, which aren’t vulnerable to Shor’s algorithm when the keys are of a sufficient size.

    The ML in the ML-KEM name refers to Module Learning with Errors, a
    problem that can’t be cracked with Shor’s algorithm. As explained here, this problem is based on a “core computational assumption of lattice-based cryptography which offers an interesting trade-off between guaranteed
    security and concrete efficiency.”

    ML-KEM, which is formally known as FIPS 203, specifies three parameter sets of varying security strength denoted as ML-KEM-512, ML-KEM-768, and ML-KEM-1024. The stronger the parameter, the more computational resources
    are required.

    The other algorithm added to SymCrypt is the NIST-recommended XMSS.
    Short for eXtended Merkle Signature Scheme, it’s based on “stateful hash-based signature schemes.” These algorithms are useful in very
    specific contexts such as firmware signing, but are not suitable for more general uses.

    ** *** ***** ******* *********** *************
    My TedXBillings Talk

    [2024.09.13] Over the summer, I gave a talk about AI and democracy at TedXBillings. The recording is live.

    Please share. I’m hoping for more than 200 views....

    ** *** ***** ******* *********** *************
    Upcoming Speaking Engagements

    [2024.09.14] This is a current list of where and when I am scheduled to
    speak:

    I’m speaking at eCrime 2024 in Boston, Massachusetts, USA. The event runs from September 24 through 26, 2024, and my keynote is at 8:45 AM ET
    on the 24th.
    I’m briefly speaking at the EPIC Champion of Freedom Awards in Washington, DC on September 25, 2024.
    I’m speaking at SOSS Fusion 2024 in Atlanta, Georgia, USA. The event will be held on October 22 and 23, 2024, and my talk is at 9:15 AM ET on October 22, 2024.

    The list is maintained on this page.

    ** *** ***** ******* *********** *************

    Since 1998, CRYPTO-GRAM has been a free monthly newsletter providing summaries, analyses, insights, and commentaries on security technology. To subscribe, or to read back issues, see Crypto-Gram's web page.

    You can also read these articles on my blog, Schneier on Security.

    Please feel free to forward CRYPTO-GRAM, in whole or in part, to
    colleagues and friends who will find it valuable. Permission is also
    granted to reprint CRYPTO-GRAM, as long as it is reprinted in its entirety.

    Bruce Schneier is an internationally renowned security technologist,
    called a security guru by the Economist. He is the author of over one
    dozen books -- including his latest, A Hacker’s Mind -- as well as
    hundreds of articles, essays, and academic papers. His newsletter and blog
    are read by over 250,000 people. Schneier is a fellow at the Berkman Klein Center for Internet & Society at Harvard University; a Lecturer in Public Policy at the Harvard Kennedy School; a board member of the Electronic Frontier Foundation, AccessNow, and the Tor Project; and an Advisory Board Member of the Electronic Privacy Information Center and
    VerifiedVoting.org. He is the Chief of Security Architecture at Inrupt,
    Inc.

    Copyright © 2024 by Bruce Schneier.

    * Origin: High Portable Tosser at my node (21:1/229.1)