ISC procedures and policies
Author: Jakub “Kuba” Łącki with input from ISC
Last updated: April 2022
Status: Approved by the ISC
Scope
Procedures
Task selection process
Updating test data and rejudging
Time extensions
Reverse-engineering the test data
Scope
This document outlines some of the procedures and policies that the ISC has been following in the last few years. The goal of this document is to:
- Ensure that the policies remain consistent over time.
- Inform the IOI community about the procedures / rationale behind ISC decisions.
- Serve as a starting point for any discussions leading to policy changes.
The policies described in this document are not binding, and the ISC may decide to deviate from them in cases that are deemed exceptional by the ISC.
Procedures
Task selection process
- The submitted tasks are reviewed by the HSC, which identifies a longlist of 20-30 tasks.
- In most cases only tasks that are clearly unsuitable for the IOI are removed at this step (clearly out of syllabus, extremely standard/textbook problems, task proposals without a description of the intended solution, etc)
- These tasks are presented to the ISC, and graded according to the following criteria:
- Difficulty. The measure of difficulty is the expected number of perfect scores
- In some cases, the difficulty of solving all but the last subtask is assessed independently
- Solution novelty
- How standard/creative the expected solution is. Low solution novelty is a strong signal for not selecting a task. Tasks with medium solution novelty may be selected if they score high in other criteria.
- Task novelty
- How (non)standard the outlined problem is. While big novelty is a plus, low task novelty does not necessarily disqualify a task. In particular, a variant of a known problem, which has a surprising/creative solution may make a great IOI problem.
- “Subtaskability”
- How many diverse subtasks can be proposed in the task. For task proposals that do not include many subtasks, ISC/HSC typically discusses the possibility of adding new subtasks. Low score in this category is a strong signal for removing the task.
- Is the task accessible?
- A task is accessible if we expect ALL IOI contestants to be able to understand the task statement and solve at least one subtask. We attempt to have at least one accessible task on each contest day.
- Once the tasks have been graded, the final step is to choose a set of 6 tasks for the two contest days plus (typically 3) backup tasks. The main constraints are:
- More novel tasks are generally preferred.
- Each contest day should consist of (sub)tasks of varying difficulty.
- The topics covered by the tasks should be diverse.
- Each contest task should have a suitable replacement. Note that a backup task can serve as a replacement for multiple contest tasks.
- The total length of the task statements should be reasonable.
- The total length of the solutions of all tasks in a day should be reasonable, i.e., there should be at most one task with a lengthy model solution.
Updating test data and rejudging
The SC may occasionally update the test data and/or rejudge some submissions. This may lead to some scores changing. In particular, nondeterministic solutions may get a different number of points (either lower or higher) after every rejudge.
Rejudging submissions is often accompanied by changing the test data and/or graders. The following rules are typically followed:
- Rejudging solutions should be kept to a minimum.
- If some test data or grader is invalid, i.e., it does not conform to what is stated in the task description, the error should be fixed and all solutions should be rejudged.
- If during the contest SC identifies an incorrect solution that gets points either because the test data/grader is weak, or because the solution was “lucky” in its random choices, this is not sufficient grounds for changing the test data or rejudging submissions.
- However, if the SC identifies that the test data/grader is different from what was intended before the contest (e.g., test data generator contains a bug and produces significantly simpler inputs or grader does not properly verify correctness of the output), the test data/grader may be updated and submissions rejudged.
Time extensions
A contestant may lose time during the contest due to unforeseen issues. This is always an unfortunate event, which introduces some amount of unfairness into the contest. A time extension can compensate for the time lost, but is also not an entirely fair solution, given that:
- This is a stressful situation, whose impact may be larger than just the time lost.
- If a contestant’s computer is unusable, the contestant can still think about solutions, before a fix is deployed.
We believe that a contestant can be given a time extension, when ALL of the following conditions are met:
- The contestant has lost a significant amount of time (10 or more minutes) OR the issue is resolved less than 10 minutes before the end of the contest
- Rationale: otherwise the disruption is not severe, the contestant can use the time to think, take a snack/bathroom break, etc.
- The issue is not caused by the contestant.
- Example of an issue caused by the contestant: a machine hangs due to a solution using too much RAM.
- The duration of the issue can be reliably and independently verified by the IOI technical or scientific committee.
- Examples when this condition is met:
- A power / hardware failure affects the machines owned by the IOI organizers.
- A machine is misconfigured by the IOI organizers and unusable.
- Examples when this condition is not met:
- An internet failure / power failure / fire alarm disrupts the work of a contestant working remotely.
- A contestant declares that they are unwell.
- The set of affected contestants is identified before the end of the contest.
The above rules apply to the cases when the disruption is resolved at least 10 minutes before the end of the contest. The remaining cases are handled on a case-by-case basis.
Reverse-engineering the test data
IOI 2021 rules describe the following cheating behavior:
contestants must not reverse engineer the test data in order to solve the problems in highly test-data-dependent manners. One example of such behavior is using the feedback system to extract the test data and then applying this knowledge to build solutions adapted to the specific test cases in the grading system. This behavior would be considered cheating only if a contestant submits a solution that would solve significantly fewer test cases correctly if the test data were replaced by an equivalent set of test cases (e.g., one generated with a different random seed).
Note that “and” highlighted in the above paragraph is key in the formulation of the rule. In particular, extracting some information from the test data (e.g. using it for debugging) is acceptable, as long as the extracted knowledge is not used to solve the problem for the particular test cases provided, and not in full generality.
We acknowledge that this rule is (deliberately) not very specific. In order to better explain ISC stand, we provide examples that the ISC does/does not consider cheating.
Examples of allowed behaviors:
- Adding assertions to the solution to assist in debugging a solution (Contestant’s intention is to develop a correct solution).
- Tuning a performance-related parameter in the solution (again, the intention is to develop a solution that runs within the time limit on all inputs)
- Running different heuristics based on the structure of the input data, as long as the different cases considered are defined without extracting information from the test data
- Intentionally developing a solution that does not work on a large set S of valid inputs (hoping they are not present in the test data), where S is defined independently of the test data in the grading system.
Examples of cheating/highly suspicious behavior:
- Any code that attempts to isolate an individual test case, e.g., code like:
- if size(input_array) == 31337 and input_array[2] == 42
- Where the conditions do not follow naturally from the problem statement, and heuristic2 is a procedure that clearly does not solve the problem in full generality