Artwork

The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

EA - Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety by titotal

22:44
 
공유
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 (1M ago)

What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 440590813 series 3314709
The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety, published by titotal on September 18, 2024 on The Effective Altruism Forum.
Disclaimer: I am a computational physicist's and this investigation is outside of my immediate area of expertise. Feel free to peruse the experiments and take everything I say with appropriate levels of skepticism.
Introduction:
The centre for AI safety is a prominent AI safety research group doing technical AI research as well as regulatory activism. It's headed by Dan Hendrycks, who has a PHD in computer science from Berkeley and some notable contributions to AI research.
Last week CAIS released a blog post, entitled "superhuman automated forecasting", announcing a forecasting bot developed by a team including Hendrycks, along with a technical report and a website "five thirty nine", where users can try out the bot for themselves. The blog post makes several grandiose claims, claiming to rebut Nate silvers claims that superhuman forecasting is 15-20 years away, and that:
Our bot performs better than experienced human forecasters and performs roughly the same as (and sometimes even better than) crowds of experienced forecasters; since crowds are for the most part superhuman, so is FiveThirtyNine.
He paired this with a twitter post, declaring:
We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets.
The claim is this: Via a chain of prompting, GPT4-o can be harnessed for superhuman prediction. Step 1 is to ask GPT to figure out the most relevant search terms for a forecasting questions, then those are fed into a web search to yield a number of relevant news articles, to extract the information within. The contents of these news articles are then appended to a specially designed prompt which is fed back to GPT-4o.
The prompt instructs it to boil down the articles into a list of arguments "for" and "against" the proposition and rate the strength of each, to analyse the results and give an initial numerical estimate, and then do one last sanity check and analysis before yielding a final percentage estimate.
How do they know it works? Well, they claim to have run the bot on several metacalculus questions and achieved accuracy greater than both the crowd average and a test using the prompt of a competing model. Importantly, this was a retrodiction: they tried to run questions from last year, while restricting it's access to information since then, and then checked how many of the subsequent results are true.
A claim of superhuman forecasting is quite impressive, and should ideally be backed up by impressive evidence. A previous paper trying similar techniques yielding less impressive claims runs to 37 pages, and it demonstrates them doing their best to avoid any potential flaw or pitfall in the process(and I'm still not sure they succeeded). In contrast, the CAIS report is only 4 pages long, lacking pretty much all the relevant information one would need to properly assess the claim.
You can read feedback from the twitter replies, Manifold question, Lesswrong and the EA forum, which were all mostly skeptical and negative, bringing up a myriad of problems with the report. This report united most rationalists and anti-rationalists in skepticism. Although I will note that both AI safety memes and Kat Woods seemed to accept and spread the claims uncritically.
The most important to highlight is these twitter comments by the author of a much more rigorous paper cited in the report, claiming that the results did not replicate on his side, as well as this critical response by another AI forecasting institute.
Some of the concerns:
The retrodiction...
  continue reading

2437 에피소드

Artwork
icon공유
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 (1M ago)

What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 440590813 series 3314709
The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety, published by titotal on September 18, 2024 on The Effective Altruism Forum.
Disclaimer: I am a computational physicist's and this investigation is outside of my immediate area of expertise. Feel free to peruse the experiments and take everything I say with appropriate levels of skepticism.
Introduction:
The centre for AI safety is a prominent AI safety research group doing technical AI research as well as regulatory activism. It's headed by Dan Hendrycks, who has a PHD in computer science from Berkeley and some notable contributions to AI research.
Last week CAIS released a blog post, entitled "superhuman automated forecasting", announcing a forecasting bot developed by a team including Hendrycks, along with a technical report and a website "five thirty nine", where users can try out the bot for themselves. The blog post makes several grandiose claims, claiming to rebut Nate silvers claims that superhuman forecasting is 15-20 years away, and that:
Our bot performs better than experienced human forecasters and performs roughly the same as (and sometimes even better than) crowds of experienced forecasters; since crowds are for the most part superhuman, so is FiveThirtyNine.
He paired this with a twitter post, declaring:
We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets.
The claim is this: Via a chain of prompting, GPT4-o can be harnessed for superhuman prediction. Step 1 is to ask GPT to figure out the most relevant search terms for a forecasting questions, then those are fed into a web search to yield a number of relevant news articles, to extract the information within. The contents of these news articles are then appended to a specially designed prompt which is fed back to GPT-4o.
The prompt instructs it to boil down the articles into a list of arguments "for" and "against" the proposition and rate the strength of each, to analyse the results and give an initial numerical estimate, and then do one last sanity check and analysis before yielding a final percentage estimate.
How do they know it works? Well, they claim to have run the bot on several metacalculus questions and achieved accuracy greater than both the crowd average and a test using the prompt of a competing model. Importantly, this was a retrodiction: they tried to run questions from last year, while restricting it's access to information since then, and then checked how many of the subsequent results are true.
A claim of superhuman forecasting is quite impressive, and should ideally be backed up by impressive evidence. A previous paper trying similar techniques yielding less impressive claims runs to 37 pages, and it demonstrates them doing their best to avoid any potential flaw or pitfall in the process(and I'm still not sure they succeeded). In contrast, the CAIS report is only 4 pages long, lacking pretty much all the relevant information one would need to properly assess the claim.
You can read feedback from the twitter replies, Manifold question, Lesswrong and the EA forum, which were all mostly skeptical and negative, bringing up a myriad of problems with the report. This report united most rationalists and anti-rationalists in skepticism. Although I will note that both AI safety memes and Kat Woods seemed to accept and spread the claims uncritically.
The most important to highlight is these twitter comments by the author of a much more rigorous paper cited in the report, claiming that the results did not replicate on his side, as well as this critical response by another AI forecasting institute.
Some of the concerns:
The retrodiction...
  continue reading

2437 에피소드

모든 에피소드

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드