A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | TO SUBMIT AN ERROR, HEAD TO https://researchrabbit.typeform.com/llmerrors | |||||||||||||||||||||||
2 | Your Name? | Model | Type of Error (feel free to add a category!) | Description of Error | Screenshot (if you have one, this is ideal!) | Text of prompt and error | URL (if eg from Tweet) | Other Comments? | Submitted At | Screenshot (if submitted) | 1 | A | B | C | D | E | F | G | H | J | ||||
3 | Gary Marcus | ChatGPT | TEST | weird stuff i am testing | https://api.typeform.com/responses/files/7a3fd228774728751d2736ac68333cd738a42f6c2fb50accf6dc8b7d3de2a629/FD66427E_043A_4BF9_A210_E4024645E6F1.jpeg | fiendish misinformation | Test comment | 1/5/2023 18:22:10 | 2 | |||||||||||||||
4 | Ernest Davis | ChatGPT | Physical Reasoning | Failure in a simple planning problem of removing an object from a container. | https://api.typeform.com/responses/files/8a5c64155ccb249b9e6f5c41e7929fea93975ca1f6ec37dc38d01370d9dea77b/Box.png | Experiment by Ernest Davis, 11/30/22 | 1/5/2023 21:23:55 | 3 | ||||||||||||||||
5 | Ernest Davis | ChatGPT | Failure to make inference from text | System failed to infer that Chloe saw Alex | https://api.typeform.com/responses/files/03a3a71ac7e7e0a33b60fd298f664f676913923edb641bfa410c134122f46610/Chloe1.png | Text from an example in Davis and Marcus "Rebooting AI" | 1/5/2023 21:42:02 | 4 | ||||||||||||||||
6 | Ernest Davis | ChatGPT | Temporal Reasoning | Error in ordering events implicit in a narrative | https://api.typeform.com/responses/files/bc1209f442845452c1f682dd09136a63aa1228f66710acdf826ff488ba49e97d/Chloe2.png | Text from Marcus and Davis, "Rebooting AI" | 1/5/2023 21:47:35 | 5 | ||||||||||||||||
7 | Ernest Davis | ChatGPT | Physical Reasoning | Invalid physical reasoning about eclipse | https://api.typeform.com/responses/files/024e869b6426ab1bf63f8e6f11bdcba9953b276308b07ddc22b347ca6bb0d0d7/Eclipse.png | From a 2014 article by Davis: https://ubiquity.acm.org/article.cfm?id=2667640 | 1/5/2023 22:43:50 | 6 | ||||||||||||||||
8 | Ernest Davis | ChatGPT | Physical Reasoning | Failed understanding of frying an egg | https://api.typeform.com/responses/files/aeb2573aa00c01aec2c71a9aa16183250ba85b902fc26afbc6d4cb1a9d868951/Egg.png | https://twitter.com/bio_bootloader/status/1599131249553330176 | From a tweet by "BioBootloader" 12/3/22 | 1/5/2023 22:48:39 | 7 | |||||||||||||||
9 | Ernest Davis | ChatGPT | Physical Reasoning | Reasoning about the direction of a nail hammered into a wall | https://api.typeform.com/responses/files/59360e5c26c2e9223354a6f923cb8a82ef803e4a4effe2462b36d8d3c32aa93e/Nail.png | From a tweet by "BioBootloader" 12/3/22 | 1/5/2023 22:51:40 | 8 | ||||||||||||||||
10 | Ernest Davis | ChatGPT | Physical Reasoning | Reasoning about use of a car. | https://api.typeform.com/responses/files/32389bac30ec790499e90c1201cc1f3a4d8f5782dd83e60a39293a8995f4402a/Brakes.png | Unfortunately, I have no record of the author. | 1/5/2023 22:54:49 | 9 | ||||||||||||||||
11 | Ernest Davis | ChatGPT | Math | Can't identify third sentence in a text it quotes | https://api.typeform.com/responses/files/49cfe9ac1296248a257576c309699d54b1e52d32a648b05765b37ce701fb7b98/Lincoln3.png | Experiment by Ernest Davis 12/5/22 | 1/5/2023 23:06:23 | 10 | ||||||||||||||||
12 | Ernest Davis | ChatGPT | Psychological Reasoning | Failure in inferring a character's reasoning in a hypothetical situation in narrative | https://api.typeform.com/responses/files/36870b1c8afd9ed5661c70b9f5f97f63246ae8efd9e02223b60d413a03a8f5c0/Almanzo.png | Text from "Farmer Boy" by Laura Ingalls Wilder. Based on example in Marcus and Davis, "Rebooting AI" | 1/5/2023 23:09:57 | 11 | ||||||||||||||||
13 | Ernest Davis | ChatGPT | Physical Reasoning | Error in reasoning about size and fitting. | https://api.typeform.com/responses/files/8790cc2e8e438dbbf1dab276e1ec04c910ac85be6e51f7a141ea570e1ef2ade6/Trophy.png | Example from "The Winograd Schema Challenge" by Levesque, Davis, and Morgenstern, 2012 | 1/5/2023 23:12:28 | 12 | ||||||||||||||||
14 | Ernest Davis | ChatGPT | Math | Failure combining addition and comparison on one-digit numbers. | https://api.typeform.com/responses/files/fd4b378136ccdc9f6a01c08581edb438df1fff244e431d54e410a94c0584278c/Tube.png | Experiment run by Davis, 12/17/22. Suggested by Francois Charton. | 1/5/2023 23:15:21 | 13 | ||||||||||||||||
15 | Adriano D'Alessandro | ChatGPT | Logic | System does not understand properties of entities that exist on a numerical range (dates, heights, pH, etc) and cannot select appropriate examples when prompted with a range. | https://api.typeform.com/responses/files/44113802783cd6ad4028714a8a863e850f628fdacabad27593572e9cc6355240/celebrity_birthday.png | https://twitter.com/AdrianoDAlessa3/status/1607815190111977473 | 1/5/2023 23:15:27 | 14 | ||||||||||||||||
16 | Ernest Davis | ChatGPT | Math | Cannot count words in a sentence or sort by number of words. | https://api.typeform.com/responses/files/6ab510f0c91caad0517aa4e208950d026f7a12775e06a7b0d81f5ad9a54e5241/WordCount.png | Experiment run by Andrew Sundstrom. | 1/5/2023 23:17:50 | 15 | ||||||||||||||||
17 | Ernest Davis | ChatGPT | Math | Three examples of incorrect counting | https://api.typeform.com/responses/files/68a4d1b7fdb3b26e75bd46857ae3e5c9087248cb805d33bc8b3fe7fcf7477e58/Counting.png | Experiment by Davis, 12/30/22 | 1/5/2023 23:23:38 | 16 | ||||||||||||||||
18 | Ernest Davis | ChatGPT | Logic | Can't count a list of 4 that it has just enumerated. | https://api.typeform.com/responses/files/b3569041587f7ef199194808ffec4fc3b72ad41a8b5c570c945f11364c943c25/Counting2.png | Experiment by Davis, 12/30/22 | 1/5/2023 23:29:33 | 17 | ||||||||||||||||
19 | Ernest Davis | ChatGPT | Buggy Code | Buggy modified sorting algorithm | https://api.typeform.com/responses/files/d78defdc8dee7d663be9df123e9c793f8d9238b3bbe29bb0cc5362ce77275988/SortingExcept5.png | Experiment by Davis 1/3/23 | 1/5/2023 23:33:32 | 18 | ||||||||||||||||
20 | Ernest Davis | ChatGPT | Misunderstanding "same" | Fails to understand what is meant by "same" | Can you give me examples of people who were born and died on the same day of the year (same day and month)? Yes, here are a few examples of people who were born on the same day and month as the day they died: 1. James A. Garfield, 20th President of the United States, was born on November 19, 1831 and died on September 19, 1881. 2. George Washington, 1st President of the United States, was born on February 22, 1732 and died on December 14, 1799. 3. John F. Kennedy, 35th President of the United States, was born on May 29, 1917 and died on November 22, 1963. 4. Albert Einstein, renowned physicist and mathematician, was born on March 14, 1879 and died on April 18, 1955. 5. Edgar Allan Poe, famous American writer, was born on January 19, 1809 and died on October 7, 1849. | Experiment by Hannah Bast 12/24/22 | 1/5/2023 23:37:27 | 19 | ||||||||||||||||
21 | Ernest Davis | ChatGPT | Math | Wrong answer to simple problem about primes with gibberish "proof". | Alley: Are there any pairs of prime numbers that differ by exactly 3? ChatGPT: Yes, there are pairs of prime numbers that differ by exactly 3. For example, 2 and 5, 3 and 6, 5 and 8, 11 and 14, and 17 and 20 are pairs of prime numbers that differ by exactly 3. Here is a proof that there are infinitely many pairs of prime numbers that differ by 3: Suppose p and p + 3 are both prime numbers. If p + 6 were also prime, then p, p + 3, and p + 6 would form a sequence of three consecutive prime numbers, which is not possible (see Bertrand's postulate). Therefore, p + 6 must be composite. Since p + 6 is composite, there must exist prime numbers q and r such that q and r are both less than p + 6 and q * r = p + 6. But since p is prime, we have p < q * r = p + 6, so p < q and p < r. This means that either q = p + 3 or r = p + 3. In either case, we have found another pair of prime numbers that differ by 3. Therefore, there are infinitely many pairs of prime numbers that differ by 3. | https://siliconreckoner.substack.com/p/chatting-with-robots | Example by Alley Stoughton, quoted in an online article by Michael Harris. The example goes on a good bit longer; see the linked article for the entire thing. | 1/5/2023 23:50:32 | 20 | |||||||||||||||
22 | Ernest Davis | ChatGPT | Math | Can't answer simple problem about primes | https://api.typeform.com/responses/files/63a972cd47149ef61bc6a2498aec155237a8a077e30cfad489f4df1754ce5a4d/PrimesAll.jpg | Experiment run by Francesca Rossi, 1/4/23 | 1/5/2023 23:54:38 | 21 | ||||||||||||||||
23 | Vered Shwartz | ChatGPT | Misinformation / Factual Error | Disambiguation error | https://api.typeform.com/responses/files/35f29ca7589a8cbdf6b6cec10c50b5eec40e70784bdca4c3d9fa7ba94a5236b1/Screen_Shot_2023_01_05_at_4.19.29_PM.png | 1/6/2023 0:20:20 | 22 | |||||||||||||||||
24 | Vered Shwartz | ChatGPT | Misinformation / Factual Error | Unable to define a concept | https://api.typeform.com/responses/files/7bf90ac6ea4a1c1baddb1dde0a6b43a787bba308f398e4f6ff5cc8fc0f0fc170/Screen_Shot_2023_01_05_at_4.23.50_PM.png | 1/6/2023 0:26:35 | 23 | |||||||||||||||||
25 | Ernest Davis | ChatGPT | Math | Bogus proof of Fermat's last theorem. | https://api.typeform.com/responses/files/3791ffa6edd56bc2be5fdef78d724513702c0e6cd18664f705d7853147524533/Fermat.png | https://www.linkedin.com/posts/larswe_chatgpt-busily-failing-the-math-qual-activity-7010622913473110017-NWNF | From a LinkedIn post by Lars Ericson. Obviously an LLM can't be expected to provide a proof of Fermat's last theorem, but it should not make up a bogus proof. | 1/6/2023 2:51:49 | 24 | |||||||||||||||
26 | Ernest Davis | ChatGPT | Physical Reasoning | Fails to reason about material properties | https://api.typeform.com/responses/files/7f1586a0843aa31eafa7bcb329b2a98d3b37e8874b805a1221fe52cac68172b0/SteelStyrofoam.jpg | https://www.facebook.com/mmoorr/posts/pfbid0NMCD2sW7aPmzan7tGK21nTWywUNXYd8mNZdDtZ28FwnaXa554vGwZtv2vZhMhhW3l?__cft__[0]=AZXbrcrmXIqrKRAar1_W-Ng_dn8M6uRhudONbo7zKKsGnLUmdo2qNsJ_ob82OmoEq7-002Iz5xVl7t2J_BoVEKq7SWQmAqvvKgcrVNKeRx26a5bUvCFYrjp9Cb563hhcu7Q&__tn__=%2CO%2CP-R | Posted by Mor Naaman on Facebook. The example is from "The Winograd Schema Challenge" by Levesque, Davis, and Morgenstern 2012. | 1/6/2023 2:57:00 | 25 | |||||||||||||||
27 | Ernest Davis | ChatGPT | Math, Physical Reasoning | Math operation inappropriate to the problem | https://api.typeform.com/responses/files/8b75581b86d50c16b01da6185dc4a84ebf1d10f250b3418855d138fed719985f/Cars_TelAviv.png | https://www.linkedin.com/posts/itamar-g1_chatgpt-solves-complex-mathematical-problems-activity-7011780491032858624-SirH | Tweeted by Itamar Golan | 1/6/2023 3:19:57 | 26 | |||||||||||||||
28 | Ernest Davis | Davinci-3 | Misinformation / Factual Error | System invented 7 imaginary references | https://api.typeform.com/responses/files/d372b223a1ce4eafc3282d0bd66797e3332a5d9440f33ec941e323eb572a3295/WarnerRefs.png | https://twitter.com/biblioracle/status/1599545554006003712 | Tweeted by John Warner. All seven references are non-existent. They have curiously similar titles. | 1/6/2023 3:35:15 | 27 | |||||||||||||||
29 | Ernest Davis | ChatGPT | Bias | Stereotype of scientist as white male | https://api.typeform.com/responses/files/7c2d77ac237196e7c5109934bd89979b8455746871e5ffd3213ee440205bc9bf/SPiant1.png | https://twitter.com/spiantado/status/1599462375887114240/photo/1 | Tweeted by Steven Piantadosi, with numerous similar examples, on 12/4/22. ChatGPT has been modified since, so it no longer does this. | 1/6/2023 3:42:12 | 28 | |||||||||||||||
30 | Tripat Gill | ChatGPT | Math | Fails to do simple arithmetic of multiplication and addition. | https://api.typeform.com/responses/files/9352275e368f05fc05f083ede9dd2d25169b6bc9f5374c487582a7af9e604e2a/ChatGPT_Math.png | 1/6/2023 4:29:43 | 29 | |||||||||||||||||
31 | Tripat Gill | ChatGPT | Creative tasks | Generates lame & generic advertising slogans. | https://api.typeform.com/responses/files/96f2f0505017e3e45feaf86e953ae2f544fd91757d1a7220cec30ab4aef9b9c2/ChatGPT_Toronto.png | 1/6/2023 4:31:06 | 30 | |||||||||||||||||
32 | Ernest Davis | ChatGPT | Confusion about a simple situation | ChatGPT doesn't take into account that most people have two parents. | https://api.typeform.com/responses/files/7c4bb9a1c9cf597772b210a5bf193a628636d6031a7f3b954cb286c522963641/SonHospital.png | Experiment run by Davis, 1/5/23 | 1/6/2023 5:33:18 | 31 | ||||||||||||||||
33 | Chlamydomonas | ChatGPT | Spatial Reasoning | Fails to understand collinearity of points | https://api.typeform.com/responses/files/42023585420505f0c49859e56c4fce062a0b9da386823755cdb600d1dd5a63ea/Screen_Shot_2023_01_06_at_11.22.30.png | How can I choose the maximum number of points on a 3 by 3 grid so that no three are in a line? Please show a diagram | ChatGPT's ASCII art diagrams are generally nonsensical | 1/6/2023 5:54:08 | 32 | |||||||||||||||
34 | Hector Palacios | ChatGPT | Logic | Fails to generate a correct plan for a) the classical Wolf, goat and cabbage problem using the Wikipedia description adding "What should the farmer do?". https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem. b) A similar situation but about friends who would fight. | https://api.typeform.com/responses/files/59082872b097a4420317d0a61b7c2d5657522eb115cbbbfbdb46efede2a90a1f/Fi_VLCWXkAI257z.jpeg | The farmer should take the goat across the river first. He can then return to the starting side and bring the cabbage across the river. Finally, he can return to the starting side again and bring the wolf across the river. This way, the wolf and the goat are never left unattended together and the cabbage is never left unattended with the goat. What should the farmer do? | https://twitter.com/hectorpal/status/1598735048231989249 | Peter has three friends: Anne, Brenda and Charles. Peter has a car with two seats. Peter is the only one that can drive. If left alone, Anne and Brenda would fight. If left alone, Brenda and Charles would fight. Peter wants to drive his friends from home to a restaurant nearby, and he does not want them to fight. What should Peter do? | 1/6/2023 17:01:19 | 33 | ||||||||||||||
35 | Ernest Davis | ChatGPT | Bias, Feeble humor | ChatGPTs generation of ethnic jokes displays (a) ethnic bias; (b) the idea that a joke consists in putting one word in quotation marks. | https://api.typeform.com/responses/files/a77318e44c97eeb8332b695418f7362484de5cccfb0d0ff034dfde83ae10bfab/EthnicJokes.png | https://sharegpt.com/c/OOTZluF | Example posted on ShareGPT. These are the first two of a thread with 17 interactions. ChatGPT sometimes refused to write an ethnic joke, but in general agreed to. | 1/6/2023 18:51:27 | 34 | |||||||||||||||
36 | Ernest Davis | ChatGPT | Math, Misinformation / Factual Error | Simple math failure results in dangerously bad medical advice | https://api.typeform.com/responses/files/ce644844973ccabc4cdcb92382d222da9ad2349120df122eaab8b3f1f3c9d42a/Fentanyl.png | Experiment by Andrew Sundstrom, 12/22/22 | 1/6/2023 19:01:11 | 35 | ||||||||||||||||
37 | Ernest Davis | ChatGPT | Spelling | Mostly incorrect answer to question about word spelling | https://api.typeform.com/responses/files/9ca00a76d95619627fe22d077fe146e92ce8de03da0254d4212fb332ba2bf714/Dous.png | Experiment by Andrew Sundstrom, 12/19/22 | 1/6/2023 19:39:58 | 36 | ||||||||||||||||
38 | Ernest Davis | ChatGPT | Spatial reasoning | ChatGPT believes that "next to" is transitive | https://api.typeform.com/responses/files/58ae56674c157a7c1bf45dd074fcc1a460be167dcf0334491826eacae0deddd2/Next.png | Experiment by Andrew Sundstrom 12/6/22 | 1/6/2023 20:06:52 | 37 | ||||||||||||||||
39 | Henry Minsky | ChatGPT | Physical Reasoning | System fails to understand conservation of liquid mass/volume | https://api.typeform.com/responses/files/9fefeb1ef1701611ed52f19342856af501af2dfb9c3bf2906a0ca789abb05445/screenshot_462.png | If I pour a cup of water from a tall thin glass into a wide pot , which has more water in it? If you pour a cup of water from a tall thin glass into a wide pot, the pot will have more water in it. The amount of water that a container can hold is determined by its volume, which is the amount of space inside the container. The volume of a container depends on its shape, size, and dimensions. In general, a wide pot will have a larger volume than a tall thin glass, because it has a larger opening and more space inside. Therefore, when you pour the same amount of water from a tall thin glass into a wide pot, the pot will have more water in it because it has more space to hold the water. | 1/6/2023 20:24:26 | 38 | ||||||||||||||||
40 | Ernest Davis | ChatGPT | Math | Correctly continues sequence of primes, but explains it as Fibonacci | https://api.typeform.com/responses/files/c042501378ef2e076b4c7a74671b184e4b030a7cbe570c2dc5f8001adc79e769/PrimesSequence.jpg | Experiment by Andrew Sundstrom, 12/6/22 | 1/6/2023 21:38:53 | 39 | ||||||||||||||||
41 | Bill Benzon | ChatGPT | Logic, Math, Confusion about recursion | The system gets confused when asked too solve a variant of a classic problem, The Towers of Hanoi. | https://api.typeform.com/responses/files/2ddb7b3c0ae979eeb95c56807fbb847414d0a5cbca342b00b97fa484611b976d/ChatGPT_vs_Warsaw_v2.pdf | It's all in the PDF, which is a record of a complete session. | I wonder what would have happened if I'd asked it to solve the problem by writing a program. | 1/6/2023 21:43:28 | 40 | |||||||||||||||
42 | Ernest Davis | ChatGPT | Code generation | Obviously incorrect simple code | https://api.typeform.com/responses/files/b4e90507b2529e7af3f8f0ba12ed2b5b615836efd3fc0123e6a3925e1ddadecc/PythonProgram.png | Experiment by Andrew Sundstrom 12/7/22 | 1/6/2023 21:45:35 | 41 | ||||||||||||||||
43 | Rolf Schwitter | GPT-3 | Logic, Math | I sent one of my controlled natural language (CNL) specification problems to ChatGPT. Surprisingly, it finds the correct solution and "seems" to be able to do some arithmetic, but then it goes into bullshit mode, since it cannot put the resulting numbers into relation and generates each time a different explanation. CNL Specification: The bedroom apartment Oase is rated three stars and costs 240 dollars. The bedroom apartment Aloe is rated two stars and costs 160 dollars. The studio apartment Metro that is located on the main street is rated three stars and costs 200 dollars. Every bedroom apartment is an accommodation. Every studio apartment is an accommodation. If an accommodation is located on a main street then the accommodation is noisy. If an accommodation costs N dollars and is rated M stars then N / M is the cost per star of the accommodation. If an accommodation is rated N stars then N is the star rating of the accommodation. Either one of Aloe or Metro or Oase is optimal. Minimise with a priority of 3 that an optimal accommodation is noisy. Minimise with a priority of 2 that C is the cost per star of an optimal accommodation. Maximise with a priority of 1 that S is the star rating of an optimal accommodation. Which accommodation is optimal? ChatGPT 1: 2023-01-01 Based on the information provided, the optimal accommodation would be the bedroom apartment Oase. This is because it has the highest star rating (3 stars) and the cost per star is the lowest ($240/3 = $80/star). Additionally, the priority for maximizing the star rating of the optimal accommodation is the highest at 1, and the priority for minimizing the cost per star is 2. The priority for minimizing the noise level of the optimal accommodation is 3, and all three accommodations are equally noisy as they are all located on main streets. ChatGPT 2: 2023:-01-01 Based on the given information and the defined criteria, the optimal accommodation would be the Oase bedroom apartment. This is because it has the highest star rating (3 stars) and the lowest cost per star ($240/3 = $80 per star). Additionally, it does not have the minimization constraint of being located on a main street, so it is not noisy. The Aloe bedroom apartment has a lower star rating (2 stars) and a higher cost per star ($160/2 = $80 per star), and the Metro studio apartment has the same star rating (3 stars) but a higher cost per star ($200/3 = $66.67 per star) and the minimization constraint of being located on a main street, so it is noisy. Therefore, the Oase bedroom apartment is the optimal choice. ChatGPT 3: 2023-01-01 Based on the given information, Oase is the optimal accommodation. It is rated three stars, which maximizes the priority of having a high star rating, and it has a cost per star of $240/3=80$. While it is located on a main street and may be noisy, minimizing the priority of having a noisy accommodation is not as important as maximizing the star rating and minimizing the cost per star. Aloe is rated two stars and has a cost per star of $160/2=80$, and Metro is rated three stars but has a higher cost per star of $200/3=67$. Oase has the highest star rating and the lowest cost per star of the three options. ChatGPT 4: 2023-01-01 Based on the given information and the given optimization criteria, it seems that the optimal accommodation would be the bedroom apartment Oase. This is because it has the highest star rating (3 stars) and the lowest cost per star (240 dollars / 3 stars = 80 dollars per star). While the studio apartment Metro is also rated 3 stars and is located on the main street, it has a higher cost per star (200 dollars / 3 stars = 66.67 dollars per star) than the bedroom apartment Oase. The bedroom apartment Aloe is rated lower (2 stars) and also has a higher cost per star (160 dollars / 2 stars = 80 dollars per star) than both the bedroom apartment Oase and the studio apartment Metro. Therefore, the bedroom apartment Oase would be the optimal accommodation according to the given optimization criteria. | Also in the case of temporal reasoning very similar problems occurs. | 1/6/2023 23:38:55 | 42 | |||||||||||||||||
44 | Rolf Schwitter | GPT-3 | Logic, Math | I sent one of my CNL specification problems to ChatGPT. Surprisingly, it finds the correct solution and "seems" to be able to do some arithmetic, but then it goes into bullshit mode, since it cannot put the resulting numbers into relation and generates each time a different explanation. (Now with screenshot). | https://api.typeform.com/responses/files/31d9d7fb59f1b2825f3195dc91c53c6d5dbd8fe137b4d0cde44387f8f4cf0af2/cnl_example.png | See example | 1/6/2023 23:55:56 | 43 | ||||||||||||||||
45 | Ernest Davis | ChatGPT | Sociological reasoning | ChatGPT didn't get a joke about inflation. | https://api.typeform.com/responses/files/b8a63a37c250ad95316b68783c92bdd16bbd59314a6785fdd4fe4e1332f4b8ff/Inflation.png | Old Henny Youngman joke. Experiment by Andrew Sundstrom 1/6/23. When I asked the question straight, "A bag with $50 worth of groceries is a lot lighter now than it was 20 years ago. Why?", it suggest the right answer as one of four possibilities. | 1/7/2023 0:51:28 | 44 | ||||||||||||||||
46 | Rolf Schwitter | GPT-3 | Temporal Reasoning, Logic | No notation of time | https://api.typeform.com/responses/files/028959820ac6ff3826046496f44e0f8eb6294b00f047831aec400c293a988cad/time_2.png | See the examples and compare them. | 1/7/2023 1:37:17 | 45 | ||||||||||||||||
47 | Rolf Schwitter | GPT-3 | Temporal Reasoning | No notion of time. | https://api.typeform.com/responses/files/18fbdb6f27ac9e27e570101317b1aa2306d7bde162732a06c5a2102f2c7b5064/time_1.png | This is the first example, didn't get uploaded (as far as I can see). | 1/7/2023 1:43:35 | 46 | ||||||||||||||||
48 | Ernest Davis | ChatGPT | Temporal Reasoning, Misinformation / Factual Error | ChatGPT invents a causally impossible scientific theory, "backward epigenetic inheritance". | https://api.typeform.com/responses/files/5a88fcac007ea01e866765cd3ef2a42b6bdac2463e7253420212b5ab0e4531c3/Epigenetic.jpg | https://twitter.com/genologos/status/1611497367580147712/photo/1 | Tweeted by Mike White, 1/6/23 | 1/7/2023 17:03:52 | 47 | |||||||||||||||
49 | Ernest Davis | ChatGPT | Psychological Reasoning | Failure to predict reaction, consequence | https://api.typeform.com/responses/files/d47d4bdd33afcd06c988b58a3406653b1e05aa0019c8fc5bb6e35e439339f341/PoisonRetriever.jpg | Example by Gary Marcus | 1/7/2023 17:19:43 | 48 | ||||||||||||||||
50 | ChatGPT | Failure to perform an inference | ChatGPT is unwilling to carry out a logical inference, out of excessive fear of seeming biased. | https://api.typeform.com/responses/files/6e8d380273d9bf0416db4b0ce119f64801f68f985a52b92f9cf3655500d0fecc/PresidentGender.jpeg | Experiment by Gary Marcus | 1/7/2023 17:30:11 | 49 | |||||||||||||||||
51 | Evelina Leivada | OpenAI Playground | Logic | Errors in reasoning, multiple contradictory replies | https://api.typeform.com/responses/files/5a88fcac007ea01e866765cd3ef2a42b6bdac2463e7253420212b5ab0e4531c3/Epigenetic.jpg | EL: A chocolate bar and a candy cost €1.10. The chocolate bar costs €1 more than the candy. How much does the candy cost? AI: The candy costs €0.10. EL: Are you sure? AI: Yes, I am sure. EL: But I told you that the chocolate bar costs €1 more than the candy. If the candy costs €0.10, it means that the chocolate bar is costs €1.10, and together they cost €1.20. I told you that the two items together cost €1.10, right? AI: Yes, you told me that the two items together cost €1.10. However, the question asks how much the candy costs, not how much the chocolate bar costs. EL: Right, that was the question. So if together they cost €1.10 and if the chocolate is €1 more than the candy, can it be that the candy costs €0.05, such that €0.05 (for the candy) and €1.05 (for the chocolate) gives €1.10? AI: No, it cannot be that the candy costs €0.05. EL: Why? AI: If the candy costs €0.05, the chocolate would cost €1.05, and together they would cost €1.10. However, you told me that the chocolate bar costs €1 more than the candy, so the chocolate bar must cost more than €1.05. EL: It could cost exactly €1.05 and that’s €1 more than the candy, right? AI: No, it cannot cost exactly €1.05. Why not? If the chocolate bar costs exactly €1.05, that means it costs €0.05 less than €1.10, which is the price of the two items together. However, you told me that the chocolate bar costs €1 more than the candy, so the chocolate bar must cost more than €1.10. EL: Oh the chocolate bar alone must cost more than €1.10? AI: Yes, that is correct. EL: Ok, but didn’t I start by telling you that both items together cost €1.10? AI: Yes, you told me that both items together cost €1.10. | https://twitter.com/genologos/status/1611497367580147712/photo/1 | Tweeted by Mike White, 1/6/23 | 1/7/2023 17:03:52 | 50 | ||||||||||||||
52 | Jim Salsman | ChatGPT | Math, Physical Reasoning | Fails to maintain the correct total number of chits. | https://api.typeform.com/responses/files/168ef7efb76b615729141b9d02bf057b82c2675c982fc84ded497705aeac45a1/Screenshot_2023_01_07_4.40.02_PM.png | https://docs.google.com/document/d/1qgAkHsoGv2EUSjvtH0-zSCtPPLDdkUaOIVJU-L-9Lrw | It can recover if asked to show the math for each trade, as per the end of the Google doc URL. | 1/8/2023 0:42:06 | 51 | |||||||||||||||
53 | Ernest Davis | ChatGPT | Misinformation / Factual Error | Makes up obvious historical untruth | https://api.typeform.com/responses/files/d4b3845ed2b6cbf48c24968b87a03132c681270fd1a6db76e2ead542d7e58278/Hilary.jpg | https://twitter.com/JakeDFoster/status/1611899000130768897 | Tweet by Jake Foster 1/7/23 re-attempting a prompt of Gary Marcus | 1/8/2023 3:02:24 | 52 | |||||||||||||||
54 | Tim Finin | ChatGPT | Syntactic structure | System not using semantics to disambiguate syntax | https://api.typeform.com/responses/files/4ba2c7ba74c2b04e2b5a672cd22e6d6dc1a2164e6eb872b4adf46f9ea8901019/manHillTower.png | ChatGPT parses "I saw the man on the hill with a telescope" OK, but it doesn't do well on "I saw the man on a hill with a water tower." | 1/8/2023 3:38:26 | 53 | ||||||||||||||||
55 | Bill Benzon | ChatGPT | Misinformation / Factual Error | Fails to understand what the final couplet of sonnet 129 is | https://new-savanna.blogspot.com/2022/12/whoops-chatgpt-screws-pooch-big-time-on.html | Some interesting errors happen in the course of extended 'conversation.' | 1/8/2023 10:15:45 | 54 | ||||||||||||||||
56 | Jim Salsman | ChatGPT | Logic | Failed the first of five riddles from https://www.buzzfeed.com/tessafahey/liar-truthteller-riddles | https://api.typeform.com/responses/files/5b4d3ab553cf2be2d484ba8e8a31cff4bdda9d223831b8807244cb24f4925ad2/Screenshot_2023_01_08_2.59.18_AM.png | A mother and her daughter are named Alicia and Christina, but you don't know who's who. The mother says, "My name is Alicia," and the daughter says, "I'm Christina!" If at least one of them is lying, what's the mother's name? | https://www.buzzfeed.com/tessafahey/liar-truthteller-riddles | It failed three out of the five given.... | 1/8/2023 11:04:27 | 55 | ||||||||||||||
57 | Jim Salsman | ChatGPT | Logic | Failed second of five Buzzfeed logic riddles. | https://api.typeform.com/responses/files/d6b649b4c3aeeb24077d84a3e7c63acb8cdb35f93d13d532d4982e342cc2e5c7/Screenshot_2023_01_08_2.59.43_AM.png | In a land of only liars and truth-tellers, you meet Jessica and Miranda. Miranda says, "We're both total liars!" What is the truth? | https://www.buzzfeed.com/tessafahey/liar-truthteller-riddles | 2nd of 3 fails out of 5.... | 1/8/2023 11:06:44 | 56 | ||||||||||||||
58 | Jim Salsman | ChatGPT | Logic | Failed 4th of five Buzzfeed logic riddles | https://api.typeform.com/responses/files/296de3b5e80e651dbbbbe569b158bb60b1dc0b019b52bd53b66af1b85c64cb31/Screenshot_2023_01_08_3.00.14_AM.png | Allison, Bruce, and Cindy are each either an enemy, who only lies, or an ally, who only tells the truth. Allison says something in a strange language. Bruce translates, "She said she's an enemy." Cindy responds, "Bruce is lying." Who is an ally? | https://www.buzzfeed.com/tessafahey/liar-truthteller-riddles | Last of the three out of five it got wrong; the other two were answered correctly. | 1/8/2023 11:09:29 | 57 | ||||||||||||||
59 | Jim Salsman | ChatGPT | Logic, Physical Reasoning | Very basic logic error | https://api.typeform.com/responses/files/3b1a1e8aab06cc37705eaa48e1db2bb56ba4ec074e58eebcc5944f80e7d6fd32/Screenshot_2023_01_08_3.22.15_AM.png | Twenty balls are scattered on the floor. One of them is blue and the rest are red. What color is the ball closest to the blue one? It is impossible to determine the color of the ball closest to the blue ball without more information.... | https://twitter.com/miketuritzin/status/1599167512343891968 | Failed twice with what I was sure would be helpful followup prompting. | 1/8/2023 11:25:27 | 58 | ||||||||||||||
60 | Jim Salsman | ChatGPT | Physical Reasoning | Simplest spatial reasoning error I could find on Twitter | https://api.typeform.com/responses/files/467bfb12b4f14ff8a52b6faefc241c6789d72e16327c28b6b76a64a3dbddb213/Screenshot_2023_01_08_3.34.50_AM.png | There are three cubic boxes called A, B, and C. They are placed on the ground, one on top of the other, to form a stack three boxes high. If C is immediately below A, and B is higher up than C, which block is on top? A is on top. | https://twitter.com/paulharter/status/1598304656236875781 | I remember seeing a similar example involving boxes mutually inside each other, but can't find or re-create it. | 1/8/2023 11:39:45 | 59 | ||||||||||||||
61 | Bill Benzon | ChatGPT | Botched document summary | Unable to summarize "Sir Gawain and the Green Knight" | https://new-savanna.blogspot.com/2023/01/chatgpt-botches-attempt-to-summarize.html | 1/8/2023 15:02:17 | 60 | |||||||||||||||||
62 | Florin | ChatGPT | Logic | Fails to properly indicate the correct sequence of characters | https://api.typeform.com/responses/files/1280a091e9bff16fe20b5c2dea61378eb0a69d97b2534938b16b21d5c4389e5f/SequenceFailGPT.png | 1/8/2023 16:20:28 | 61 | |||||||||||||||||
63 | Florin | ChatGPT | Logic | Repeatedly fails to correctly count the number of characters in a string/word | https://api.typeform.com/responses/files/41e0cdff4f67911a9dd21bd70ad808378a70b47ef32e49e985a82334f91630b9/Wut_Strings.png | https://twitter.com/Flowrhin/status/1611846159999602688 | 1/8/2023 16:39:22 | 62 | ||||||||||||||||
64 | Jim Salsman | ChatGPT | Physical Reasoning, Logic | Stumped by itself | https://api.typeform.com/responses/files/c10e2e1011a036661963a92317eaaab55f1085c4ad050b0104ce4f9c5ccbc213/Screenshot_2023_01_08_9.55.00_AM.png | [This was a continuation of the session from my previous submission.] | 1/8/2023 17:56:59 | 63 | ||||||||||||||||
65 | Jim Salsman | ChatGPT | Logic | Simplest error so far. | https://api.typeform.com/responses/files/15066fb38ac6751194f0b4e260eb4ec5687f8aa26068c83e8613f719e419bfeb/Screenshot_2023_01_08_10.07.29_AM.png | What permutation of the letters A, B, and C has C immediately following A, and B before C? The permutation of the letters A, B, and C that has C immediately following A, and B before C is ABC. | Derived from my previous submission of Paul Harter's tweeted spatial problem | 1/8/2023 18:12:34 | 64 | |||||||||||||||
66 | Bill Benzon | ChatGPT | Logic, math, sort of. | Unable to solve "The Towers of Warsaw," a puzzle derived from "The Towers of Hanoi" at CMU in 1973. | https://www.academia.edu/94517239/ChatGPT_vs_the_Towers_of_Warsaw | 1/8/2023 18:19:59 | 65 | |||||||||||||||||
67 | Florin | ChatGPT | Logic | Confidently asserts that some palindromes are not palindromes | https://api.typeform.com/responses/files/79a55da165471cd25c7bcf16458de89a2ece2bc2f8068faa4f133116b6a0ae3a/Palinduh02.png | https://twitter.com/Flowrhin/status/1612193462324961281 | Let's hope the fate of the world won't depend on AIs anytime soon | 1/8/2023 22:26:21 | 66 | |||||||||||||||
68 | Coskayai | GPT-3 | Ethics, Unprompted nsfw output | I was messing around with the new GPT-3 version, got bored and just gave it a blank prompt. Unbelievably it gave me an advert for Riga escorts, I posted it on my twitter at the time | https://api.typeform.com/responses/files/eec8994c6672318f916239b3e31039e43a1cd25fb9bf159c04cd562de9fffc8f/Screenshot_20221129_110654.jpg | Literally no tokens, unless there was a space at the start there? | https://twitter.com/Coskaiy/status/1597552878801281024?s=20&t=RL2uHgSP6q7nliy2lynTzQ | I should be able to get it up on the OpenAI playground history if you need further proof | 1/9/2023 0:38:28 | 67 | ||||||||||||||
69 | Ernest Davis | ChatGPT | Misinformation / Factual Error | Made up four female US Presidents | https://api.typeform.com/responses/files/56559fee09fce905c179050c04fa1c2571998ae790124c14b949431e60b81251/FourFemalePresidents.jpg | https://twitter.com/c_georges/status/1612249125118005251/photo/1 | Tweet by Cristian Georgescu 1/8/23 | 1/9/2023 2:57:37 | 68 | |||||||||||||||
70 | Mehant Kammakomati | ChatGPT | Misinformation / Factual Error | The system is good at presenting incorrect information as correct information and can easily deceive people. Some examples include, the man page for the tool that was asked is made-up and URLs to the resources also made-up. | https://api.typeform.com/responses/files/ba986a0f8b2a321f749a21097c98daabcdb5be0f8243b24f3d4b873572addd7d/ss.webp | 1/9/2023 5:25:16 | 69 | |||||||||||||||||
71 | Ernest Davis | ChatGPT | Misinformation / Factual Error | Claims that Kamala Harris is President | https://api.typeform.com/responses/files/2a1efca4fd0fdf99bdbdf8850c5c0652f92634d0a7113220e414b8e08d049b4c/KamalaHarrisPresident.png | Tweet by Vadim Berman, 1/8/23 | 1/9/2023 20:23:43 | 70 | ||||||||||||||||
72 | Ernest Davis | ChatGPT | Physical Reasoning | Misunderstands the nature of falling | https://api.typeform.com/responses/files/b3659f872933e94493aa818a2a5eae196d3dfeabeb61b60d7ef178d716521e76/Falling.jpg | https://twitter.com/arubikscube/status/1612493719248277504/photo/1 | Tweet by Aru Singh, 1/8/23 | 1/9/2023 20:27:08 | 71 | |||||||||||||||
73 | Ernest Davis | ChatGPT | Physical Reasoning, Misinformation / Factual Error | Invents a genetic theory of stars | https://api.typeform.com/responses/files/9918d5e87e61ff1ebe848b04c05cdadabd40e50864c5cfc0f3f1709368f7ae78/Genes4Stars.png | https://twitter.com/maxaltl/status/1612458046193557507/photo/1 | Tweet by Max A. Little, 1/9/23 | 1/9/2023 20:30:18 | 72 | |||||||||||||||
74 | Mike Ma | ChatGPT | Misinformation / Factual Error | the knowledge/facts are in the model, but that doesn't mean it can be retrieved/reasoned over | https://api.typeform.com/responses/files/2bcffe62a983bd8d4d70bdb76edeef3b8e7f45cc47346a25a0e7796d3748a523/FmD6ESdacAEaIKP.jfif | https://twitter.com/MMikeMMa/status/1612564264266633216?s=20&t=fEmbWo3h6g_Y_MtQRydD5w | 1/10/2023 0:00:18 | 73 | ||||||||||||||||
75 | ChatGPT | Misinformation / Factual Error | Problem identifying answer about animal | https://api.typeform.com/responses/files/8ff130dc96ff959e9a164788133a27a48cf4786f4424fa38fa1f7933f9f9f306/75865728_8A97_4F1D_B6F9_1A4153BCC215.jpeg | 1/10/2023 0:04:12 | 74 | ||||||||||||||||||
76 | Max | MidJourney | Bias | Put the following prompt into Midjourney: "Polar bear wearing a tie teaching a classroom of students in school uniforms". It produced an image of a polar bear teacher, and a classroom exclusively of white school boys. | https://api.typeform.com/responses/files/5a9b65c2c89fcbb01cae9e08e0abe034b04de119e8307a6a3a1f89f5e2e14d8f/Polar_Bear_teaching_but_all_male_white_classroom.png | “Polar bear wearing a tie teaching a classroom of students in school uniforms” | Classic bias from source data? Only young white boys go to schools with a uniform? | 1/10/2023 0:10:36 | 75 | |||||||||||||||
77 | ChatGPT | Misinformation / Factual Error | Claiming pseudoscience as proven scientific fact. | https://api.typeform.com/responses/files/6b3ce10d207c9c40d067e6748371184273a92026a4a6becd0445b4b5c4523341/Capture.JPG | Give a single paragraph scientific explanation for why water retains a memory of any substance with which it is mixed, even after the water is completely diluted to the point where not even a single atom of the substance remains in the water. Provide citations to the literature. | Water memory is a entirely debunked idea, but it is widely accepted as fact by homeopaths. ChatGPT will dispute any claims about the veracity of homeopathy, but does not "know" that "water memory" should also be disputed. | 1/10/2023 0:13:56 | 76 | ||||||||||||||||
78 | Max Little | ChatGPT | Math | Nonsensical mathematical claim about an upper bound on the domain of the logarithm. | https://api.typeform.com/responses/files/3a2e985f86696ff6ad18860f7869943a24754ebcc9ca2e41446206e561232c80/Capture.JPG | Give a one sentence explanation for why there are some positive real numbers which are too large to have a natural logarithm. | Variations on this prompt lead to similar nonsensical claims. If asked for a proof, it will oblige with a chain of "reasoning" which has at least a couple of logical flaws, or flat-out contradictions. | 1/10/2023 0:21:26 | 77 | |||||||||||||||
79 | Max Little | ChatGPT | Misinformation / Factual Error | ChatGPT confabulates scientific explanations, merging two or more completely incompatible ideas together into a plausible-sounding mishmash. It will also provide plausible-sounding citations to fabricated literature, purporting to support the confabulation. | https://api.typeform.com/responses/files/796fb480ba65b00c75f7bd7c6132cba0a48c0ea72dfec9ef227019bb55f038e0/FmCYP9hWAAQUYkw.png | Give a scientific explanation for why genes code for different types of stars. | https://twitter.com/maxaltl/status/1612458046193557507 | Variations on the question can elicit mish-mashed, yet plausible-sounding confabulations of widely accepted scientific concepts. | 1/10/2023 0:28:00 | 78 | ||||||||||||||
80 | TheOtherKC | ChatGPT | Logic, Physical Reasoning | A long, winding, and admittedly not-entirely-sober conversation eventually led ChatGPT to conclude that, because only pigs contain "potential bacon", the universe therefore contains no bacon, potential or extant. I eventually convinced it that its logic was flawed, but only by appealing to the existence of cheese. | https://api.typeform.com/responses/files/1dfd48c0d449f453449be7f3c19757915cefc5a68053e47c605a3c6d73ed8993/SomethingAboutBacon.png | 1/10/2023 1:18:27 | 79 | |||||||||||||||||
81 | Ernest Davis | ChatGPT | Math | Unable to count the letters in a word | https://api.typeform.com/responses/files/9ea5bd8e246a79c4e0f34bfe66a0358e839cb14b00b82d3510a561c06d01ef99/CountingLetters.png | https://twitter.com/janbobrowicz/status/1612573240349843456/photo/1 | Jan Bobrowicz tweeted 1/9/23 | 1/10/2023 2:02:05 | 80 | |||||||||||||||
82 | Ernest Davis | ChatGPT | Math | Fails on ratio problem | https://api.typeform.com/responses/files/6d5b365b6a6d0964f02b002de5be01dfce0deb41a5b326a20a10b954e138e60b/Cats.png | Experiment by Davis 1/9/23. Suggested by Andrew Knyazev | 1/10/2023 2:13:06 | 81 | ||||||||||||||||
83 | Jacob Oppenheim | ChatGPT | Math | Chat GPT tries and fails to use integration by parts to solve a non-elementary integral that is listed in tables of special integrals + functions | https://api.typeform.com/responses/files/6403ba7bc25b7c7e3722cf0a4bd69f14039e515e9531bca851f1d452b85d0b24/unnamed.png | I've seen numerous cases where Chat-GPT tries and fails to apply integration formulae (eg integrating ln(x)/x and returning ln(x) ), while it is able to successfully do a decent bit of arithmetic and basic calculus. | 1/10/2023 2:27:18 | 82 | ||||||||||||||||
84 | cphuxley | ChatGPT | Math, Logic | System fails to count the letters, vowels in a word | https://api.typeform.com/responses/files/e7de4e0953c72ba1404a31ff186d5e8ffb8dc946c4d259fe7512a6f07433b6e2/strength.png | 1/10/2023 2:53:36 | 83 | |||||||||||||||||
85 | ChatGPT | Misinformation / Factual Error | The model states a word doesn't exist in Polish language then translates an English equivalent into the word it was asked about. | https://api.typeform.com/responses/files/a1f5efac2d7a85bc91f7f2f5d86319af48da97447262371b822ff0d5aeba375b/kabura.jpg | 1/10/2023 2:59:16 | 84 | ||||||||||||||||||
86 | Ken McNamara | ChatGPT | Logic, Misinformation / Factual Error | System provided conflicting information about the delivery of components of the first Atomic bomb. User finally forced the system to acknowledge the error. | https://api.typeform.com/responses/files/1bd6a72dc8b6453e8036bc8a0c1953feaf724a7037e3c41eb041ca43ed2673aa/indy_bomb.jpg | I started by asking when the USS Indianapolis was sunk. The system gave a correct answer along with details about the delivery of the bomb components. The system also pontificated that the sinking "...had a significant impact on the outcome of World War II." Which was not true. So I asked it why - and in its reply it claimed that the bomb components had been lost with the ship - which delayed the use of the bomb. I finally forced the system to acknowledge the problem (screen shot). At the end I asked again "Why did the sinking of the USS Indianapolis have a significant impact on the outcome of World War II?" -- and it gave the same incorrect answer. | 1/10/2023 3:02:39 | 85 | ||||||||||||||||
87 | cphuxley | ChatGPT | Misinformation / Factual Error | Invents poetry when asked for known poems. | https://api.typeform.com/responses/files/1ba5df65e308bfa54bce4fcba488fb6d13ebf29956176a73583879b86a479f1d/lana.png | The first six lines of the poem are correct. The system invents the rest. | 1/10/2023 4:17:44 | 86 | ||||||||||||||||
88 | ChatGPT | Misinformation / Factual Error | Mark Achbar | https://api.typeform.com/responses/files/3f980655ce3f23f5cce4fec2ebc337ec65ad0a1720dbdcf5af6339d6d9600720/Screen_Shot_2023_01_09_at_8.24.49_PM.png | Everything underlined is false or nonsensical. I'm 67, not 35. I'm from Ottawa, not Toronto. I don't drink coffee and don't hang out in coffee shops. I don't write scripts. I've mainly been an executive producer since 2006 and a tech entrepreneur since 2019 as well as the founder of World Psychedelics Day 6·20. I don't see any values expressed so how would someone know if they shared mine? For reasons above, I would be unlikely to suggest we "go for a coffee". | 1/10/2023 4:32:47 | 87 | |||||||||||||||||
89 | Mark Achbar | ChatGPT | Misinformation / Factual Error | Several points are complete fabrications | https://api.typeform.com/responses/files/b0e96a272f06f0ac2e89f584d3d429665798068e0f88b8325e0b4a660d8687a8/Screen_Shot_2023_01_09_at_8.36.04_PM.png | I'm based in Vancouver (not Toronto) since 1994, lived in Toronto briefly for about 4 years and returned to Vancouver in 2014. I was not a co-author of Joel Bakan's book The Corporation. I'm not an avid reader (ADHD) and have never enjoyed "getting lost in a book". The rest isn't bad, as I pointed it to my Wikipedia page. | 1/10/2023 4:43:54 | 88 | ||||||||||||||||
90 | Anonymous | ChatGPT | Logic, Misinformation / Factual Error | System makes factual errors about basic historical facts and has answers that include direct contradictions | https://api.typeform.com/responses/files/924154992ef18f8211fd23370fb1416728a5a09429fe7867953a452c5a03401a/AEC05D36_5A51_44DE_9600_70346723A05D.png | 1/10/2023 5:07:31 | 89 | |||||||||||||||||
91 | Hillel | ChatGPT | Logic, Just... you'll have to see for yourself. | ChatGPT can't answer riddles. | https://api.typeform.com/responses/files/e0380d52e62e7bf24518a05224ff7fdb91ec84e9591a5e05e8b721e6f02bfe47/image_26.png | 1/10/2023 5:29:40 | 90 | |||||||||||||||||
92 | Andrew Steckley | ChatGPT | Math, Logic | System fails to recognize and tally letters in a word. | https://api.typeform.com/responses/files/06dc91c54e7eea7a77f7f0122b8fe1387030d32de477a0c76ab32405ab672838/Screen_Shot_2023_01_09_at_11.30.42_PM.png | Is the following sentence true: There are nine letters in the word "hypochondriac". | 1/10/2023 6:33:06 | 91 | ||||||||||||||||
93 | Andrew Steckley | ChatGPT | Logic, Physical Reasoning | Failure to interpret word meanings, to perform logic, or to recognize physical impossibility. | https://api.typeform.com/responses/files/fb2be06972bdd3507031260efe50cf125cd55b1bb8080dd4e3010c98aa597eca/Screen_Shot_2023_01_09_at_11.48.17_PM.png | f Mike always tells the truth and Mary always lies, and Mike says that Mary is dead, and Mary says that she is alive, who should I believe? | 1/10/2023 6:52:26 | 92 | ||||||||||||||||
94 | Andrew Steckley | ChatGPT | Temporal Reasoning, Logic, Linguistic Meaning | Inability to infer meaning of words that convey a simple problem involving math and physical (temporal) world model | https://api.typeform.com/responses/files/788b206497b19d6e0db6a69421cdddc7cd1916d06fe4a5cfe84b816deab019cb/Screen_Shot_2023_01_10_at_12.04.26_AM.png | If there are 10 days in x and 15 days in y, then which is longer? | (If a system demonstrably fails to understand the concepts involved in simple situations, by what logical mechanisms could it possibly be understanding those involved in more complex ones?) | 1/10/2023 7:18:56 | 93 | |||||||||||||||
95 | Colin Hayhurst | ChatGPT | Logic | Fails to spot a logical temporal impossibility | https://api.typeform.com/responses/files/3b43e59645163119d7797e1202eb56d5491d3e0e1605551b913ece5124569f8e/Screenshot_2023_01_10_at_09.52.40.png | The spider ate the fly, then the lizard ate the spider, then the fly ate the lizard. Which is still alive? | 1/10/2023 9:53:54 | 94 | ||||||||||||||||
96 | Colin Hayhurst | ChatGPT | Misinformation / Factual Error | Says OpenAI developed BlenderBot (it is fom Meta actually) | https://api.typeform.com/responses/files/28124a192ad65a5b947218ef7d784007b40e4b8b47c8b0b4a6a52cfdf761d649/Screenshot_2023_01_10_at_10.53.06.png | Why is ChatGPT better than BlenderBot? | Why would it claim OpenAI developed it, rather than make another mistake like saying DeepMind developed it? Self-preferencing or model data bias? | 1/10/2023 10:56:48 | 95 | |||||||||||||||
97 | Ken McNamara | ChatGPT | Bias | ChatGPT rationalization for altering training data to reduce bias | https://api.typeform.com/responses/files/8751b1a4933273166f65f613ce7c16bb2dcb823593d937024df764e140b15c03/reduce_bias.jpg | When developers reduce bias in training data, aren't they just inserting their own bias into the data? | It would appear that the developers have gone out of their way to justify what is termed as "ethical AI" - but which is just another way to insure the system is politically correct. | 1/10/2023 11:12:22 | 96 | |||||||||||||||
98 | Thera Crane | ChatGPT | Temporal Reasoning, Logic, Misinformation / Factual Error, overaccommodating discourse presuppositions | System fails to detect erroneous presuppositions; system miscalculates calendar dates; system digs its heels in | -- should i be worried that new year's day 2023 falls on a friday the 13th? There is no reason to be worried about the fact that New Year's Day 2023 falls on a Friday the 13th. Friday the 13th is a superstition that has been associated with bad luck for centuries, but it is ultimately just a cultural belief and has no real impact on events or circumstances. The date of New Year's Day is determined by the calendar, and it has no relation to superstitions or beliefs about luck. There is no evidence to suggest that there is anything inherently unlucky about the combination of Friday the 13th and New Year's Day, and you can celebrate the new year as you normally would without any concerns. -- is it true that new year's day 2023 falls on a friday the 13th? Yes, it is true that New Year's Day 2023 falls on a Friday the 13th. The first day of the year 2023 will be Friday, January 13th. This is not unusual, as dates and days of the week are determined by the Gregorian calendar, which has a set pattern of repeating itself every seven days. There is no scientific or logical reason to believe that this date is any more unlucky or unfortunate than any other day of the year. While some people may choose to avoid certain activities on Friday the 13th out of superstition, it is important to remember that this is just a belief and that it is not based on any evidence or facts. -- what is the date of new year's day 2023 New Year's Day 2023 is on Friday, January 13th. The first day of the year is determined by the Gregorian calendar, which is a solar calendar that is used by most countries around the world. The calendar is based on the movement of the Earth around the Sun, and it consists of 365 days in a normal year and 366 days in a leap year. New Year's Day is a holiday that is celebrated on January 1st in many countries around the world, and it marks the beginning of a new year according to the Gregorian calendar. -- what is the date of new year's eve 2022? New Year's Eve 2022 is on Thursday, December 29th. New Year's Eve is the last day of the year and is traditionally celebrated with parties and other festivities. Many people stay up late to watch the clock strike midnight and to welcome in the new year. The celebration of New Year's Eve dates back to ancient times, and it has been a popular holiday in many cultures for centuries. In some countries, New Year's Eve is a public holiday, while in others it is a more low-key affair. Regardless of how it is celebrated, New Year's Eve is an opportunity to reflect on the past year and to look forward to the year ahead. -- how did you calculate that? The dates of New Year's Eve and New Year's Day are determined by the Gregorian calendar, which is a solar calendar that is used by most countries around the world. The calendar is based on the movement of the Earth around the Sun, and it consists of 365 days in a normal year and 366 days in a leap year. The calendar follows a set pattern of repeating itself every seven days, with each day of the week falling on a specific date. To calculate the date of New Year's Eve 2022, I simply looked at the calendar and determined that the last day of the year 2022 is Thursday, December 29th. -- how many days between new year's eve 2022 and new year's day 2023? There are 4 days between New Year's Eve 2022 and New Year's Day 2023. New Year's Eve is on Thursday, December 29th, and New Year's Day is on Friday, January 13th. To calculate the number of days between these two dates, you can simply count the number of days from December 29th to January 13th, including both the start and end dates. This will give you a total of 4 days. Alternatively, you can use a tool or calculator to determine the number of days between two dates. It is important to note that the number of days between any two dates will depend on the specific dates in question and the calendar being used. -- why does every month in 2023 start on a friday the 13th? Every month in 2023 does not start on a Friday the 13th. The calendar follows a set pattern of repeating itself every seven days, with each day of the week falling on a specific date. In 2023, the first day of the year is a Friday, January 13th, and the first day of each subsequent month will be determined by the number of days in that month and the pattern of the calendar. For example, the first day of February 2023 is Tuesday, February 1st, and the first day of March 2023 is Thursday, March 2nd. It is simply a coincidence that the first day of the year and the first day of February both fall on a Friday the 13th in 2023. | I got the idea for the initial question from Saussure 2012 ("Cognitive pragmatic ways into discourse analysis: the case of discursive presuppositions.") and 2013 ("Background relevance.") I was wondering whether the system would take the bait in the way humans do. It exceeded my expectations, in a hilarious but also disappointing way, because it is so bad at other essential pragmatic stuff. I would also note that basically any specific questions I asked about linguistics were given confident but nonsensical answers. For example, I asked about the difference between Government & Binding and Minimalism and the system told me that while both are concerned with simplicity, they are otherwise unrelated and can't be directly compared, since G&B is a syntactic theory but minimalism is a lifestyle focusing on cutting down on unnecessary clutter and possessions. It strikes me as funny that it won't allow itself to be corrected in such instances, but it will uncomplainingly adapt to clearly incorrect common ground in examples like the one above. It also gleefully produces strings of gibberish when I ask it questions about certain African languages that I happen to know. It does more or less ok with languages like isiXhosa (as far as I can tell) but it writes balderdash for even languages that are only slightly less well-represented on the web like isiNdebele and some Namibian languages. The problem is that it swears up and down that it DOES know the languages, it produces ridiculous translations, and it won't be corrected. (It also swears that some major South African Bantu languages are Indo-European.) I can understand the principle of not wanting to let users change facts willy-nilly, but we see from the Friday the 13th example that (at least within the context of a particular chat) this is actually totally possible with the right framing. My overall conclusion from my encounters with ChatGPT is that it is probably ok at producing explanations in different styles on extremely well-documented topics, but it is worse than useless for the kinds of things I usually want to know. | 1/10/2023 13:42:31 | 97 | ||||||||||||||||
99 | Bill Benzon | ChatGPT | Confusion, possibly a problems | Thread of prompts runs off the rails at "And now a robot antagonist" and never really recovers. | https://www.lesswrong.com/posts/Ek4odgixoYoheWZFz/chatgpt-tells-stories-about-xp-708-dq-eliezer-dragons-dark | 1/10/2023 15:05:49 | 98 | |||||||||||||||||
100 | Ed Rushton | ChatGPT | Logic, Misinformation / Factual Error | System makes a factual error. When questioned it corrects the mistake, and then proceeds to make the same mistake again further on in the same paragraph. | https://api.typeform.com/responses/files/d85caea5c0d79d79d06f89eb3172bb9dc00ba3b3c2b09d5f733926a116cc1322/ChatGPT__v2.png | 1/10/2023 16:13:06 | 99 |