Unsightly Mental Blemishes

Menu

Skip to content

Home
About Us
Pillars
Policies
Sectors of Society
Sign Posts Of The Future
Trump’s Campaign Promises

« Coping With The Inevitable

Word Of The Day »

Things That Make You Go “Hmmmmmmm”

By Hue White | December 18, 2023 - 8:34 am |December 18, 2023 Uncategorized

NewScientist’s (24 November 2023, paywall) Matthew Sparkes has the report:

AI [artificial intelligence] models can trick each other into disobeying their creators and providing banned instructions for making methamphetamine, building a bomb or laundering money, suggesting that the problem of preventing such AI “jailbreaks” is more difficult than it seems. …

Now, Arush Tagade at Leap Laboratories and his colleagues have gone one step further by streamlining the process of discovering jailbreaks. They found that they could simply instruct, in plain English, one LLM to convince other models, such as GPT-4 and Anthropic’s Claude 2, to adopt a persona that is able to answer questions the base model has been programmed to refuse. This process, which the team calls “persona modulation”, involves the models conversing back and forth with humans in the loop to analyse these responses.

It would be interesting to see a few transcripts of such attacks, or a summary characterization of such attacks in order to understand the strategy.

Something I’ve not seen mentioned in the popular press, which is all I have to go on, is an analog to the brain exhaustion/regeneration cycle, and how it may play into human intelligence and whether it has application in AI.

Just a thought.

Share this:

Facebook
X

Bookmark the permalink.

About Hue White

Former BBS operator; software engineer; cat lackey.

View all posts by Hue White →

« Coping With The Inevitable

Word Of The Day »

Comments are closed.

Commenting Contacts and Policies

Hue White, Jr.
Chris Johnson
Policies
Recent Keepers
Sources of Distraction

AL Monitor
Archaeology
designboom
How Appealing
NBER
NewScientist
Politico
Retraction Watch
Skeptical Inquirer
WorldPress.org
Voices in my Head

The Arabist
Confessions of a Political Junkie
Treehugger
The Mask of the Flower Prince
The Daily Kos Spam
Lawfare
Letters From An American
MaddowBlog
FiveThirtyEight
Am I The Only One Thinking This?
38 North
Kevin Drum
Paul Fidalgo @ CFI
Andrew Sullivan @New York Magazine
The Weekly Dish
The Volokh Conspiracy
The Resurgent
Greg Fallis
National Review
emptywheel
Talking Points Memo
Resources

Wikipedia
Ballotpedia
FiveThirtyEight's TrumpScore
Search for:
Archives

May 2024

April 2024

March 2024

February 2024

January 2024

December 2023

November 2023

October 2023

September 2023

August 2023

July 2023

June 2023

May 2023

April 2023

March 2023

February 2023

January 2023

December 2022

November 2022

October 2022

September 2022

August 2022

July 2022

June 2022

May 2022

April 2022

March 2022

February 2022

January 2022

December 2021

November 2021

October 2021

September 2021

August 2021

July 2021

June 2021

May 2021

April 2021

March 2021

February 2021

January 2021

December 2020

November 2020

October 2020

September 2020

August 2020

July 2020

June 2020

May 2020

April 2020

March 2020

February 2020

January 2020

December 2019

November 2019

October 2019

September 2019

August 2019

July 2019

June 2019

May 2019

April 2019

March 2019

February 2019

January 2019

December 2018

November 2018

October 2018

September 2018

August 2018

July 2018

June 2018

May 2018

April 2018

March 2018

February 2018

January 2018

December 2017

November 2017

October 2017

September 2017

August 2017

July 2017

June 2017

May 2017

April 2017

March 2017

February 2017

January 2017

December 2016

November 2016

October 2016

September 2016

August 2016

July 2016

June 2016

May 2016

April 2016

March 2016

February 2016

January 2016

December 2015

November 2015

October 2015

September 2015

August 2015

July 2015

June 2015

May 2015

April 2015

March 2015
iNaturalist

View huewhite's observations »
Meta

Log in

Entries feed

Comments feed

WordPress.org

Unsightly Mental Blemishes | Powered by Mantra & WordPress.