Best way to train AI LLM GenAI AGI : Museum of AI curators

Is AI getting closer to human values?

It’s no secret that algorithms can produce systemically biased recommendations, such as in lending decisions. Some good news: A team at Anthropic tested the hypothesis that “language models trained with reinforcement learning from human feedback (RLHF) have the capability to ‘morally self-correct’ — to avoid producing harmful outputs — if instructed to do so.” The researchers found supporting evidence across three experiments, illuminating different facets of moral self-correction.
Highlights here. Paper available on Arxiv here (5 primary authors, 49 names total: It takes a village).

Some good news from models trained with RLHF. We’re getting a bit of transparency into how AI draws its conclusions, which is solid gold for advocates of Human+AI collaboration.

Open AI has explored the benefits of ChatGPT chain-of-thought prompts, which instruct a bot to explain its reasoning step by step, rather than simply providing an answer. Their recent paper offers excellent examples of the ‘process supervision’ method of rewarding models for providing step-by-step, chain-of-thought responses, vs the ‘outcome supervision’ method of rewarding final answers.

It gets better. Testing showed that getting human approval* of an LLM’s reasoning also signals ‘alignment’ (bot conformance to human values and judgments): “process supervision also has an important alignment benefit: it directly trains the model to produce a chain-of-thought that is endorsed by humans”. So far this has been tested using math problems/mathematical reasoning, much like “show your work” back in school. It will be a glorious day when chain-of-thought prompts reliably trigger step-by-step responses aligned with human values on public policy and other issues.

Museum of AI has lamented the disincentives (rarely rewards) for people to explain their decision methods. Often important ones are declared as an act of courage and leadership by an executive, unfortunately when the organization’s culture does not necessarily reinforce open dialogue and quality decision processes.

AI adoption may be a wake-up call for decision makers who hesitate to show any of their cards. Sure, strategic decisions require more than a play-by-play of the evidence and thought process ~ spin can encourage necessary buy-in. But as AI gains strength at recommending, and slowly learns to explain ~ humans need to do the same.

*But wait. Which humans?

Human values aligned with ChatGPT responses? Sounds awesome. But how are conflicting (human) values aggregated or otherwise resolved? This is the age-old problem with weighing preferences, whether they’re stated by one or 100,000 humans. See previous posts on how human-in-the-loop AI and multicriteria decision analysis are like HouseHunters. Also a short but sweet Twitter conversation referencing Kenneth Arrow (he of the 1963 theorem) and how to handle social preferences. @xuanalogue is definitely worth following.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_134528306_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.

Stolen cars and AI ‘moral self-correction’

Museum musings.

Recent Articles

Archives

Post Category

Is AI getting closer to human values?

*But wait. Which humans?

Related Posts

Can you trust AI with your next decision? Part 3 in a series on fact-checking/citation

What’s state-of-the-art when an AI cites sources of evidence? Part 1 in our series

Museum musings.

Recent Articles

Archives

Post Category