Solutions to #10 Strings 15-295 Fall 2021

15295 Fall 2021 #10 Strings -- Problem Discussion

November 3, 2021

This is where we collectively describe algorithms for these problems. To see the problem statements and scoreboard, go to this page and select this contest.

A. Non-Substring Subsequence

B. Canine poetry

Any character s[i] in the string must be different with s[i-2], s[i-1], s[i+1], s[i+2]. We can do this greedily. For every i, if s[i] is the same as s[i-2] or s[i-1] we modify it to some other value and count the total number of modifications. We don’t need to find the new value here, since we’re guaranteed that some value will fit in here. I just modified it to ‘?’ to make it easy.

-- Yan

C. Suffix Structures

Consider the conditions under which we can make either the automaton only work, or the array only work. If the array is sufficient, then s is a permutation of t. To check for this, just count the number of each character a-z in both strings, and see whether the two are equal. If the automaton only is sufficient, then t must be a subsequence of s which can be checked in O(n) time. If both are needed, then for each character a-z, string t must contain that character less than or equal to how much string s does.

Therefore, run the subsequence check first to see if an automaton suffices. If not, traverse through both strings and count the number of each character. If the two strings are equal on all characters, then the array is sufficient, otherwise check whether it is viable to use both. If that fails, then state that we need the tree.

— Daniel

D. Palindromic characteristics

The key observation is that a string is a k-palindrome if and only if it is a palindrome, and the left and right half are (k-1)-palindromes. Once you convince yourself this is true the rest is very straightforward using dp. See Danny’s solution below :-)

-- Yan

Let me expand on Yan’s very terse description. If a string is a k-palindrome for k > 1, it’s also a k-1 palindrome. And by induction it’s also a k-2,k-3, … 1 palindrome. So for each range s[i…j] of the string, there is a k such that that string is a k palindrome, but not a k+1 palindrome. So our first goal is to compute this value for each range [i…j].

When working with strings it’s often useful to number the boundary between neighboring characters. So in a string of length n there are n+1 such boundaries numbered 0…n. So our DP table DP[i,j] is the palindrome number for the characters s[i]...s[j-1].

We initialize the DP table as follows. First set DP[i,i] = DP[i,i+1] = 1 for all i. (and 0 otherwise). Then for length = j-i = 2, 3, … we apply this: DP[i,j] = if (s[i] = s[j-1]) then DP[i+1,j-1] else 0. This correctly finds all the regular (order 1) palindromes.

Now, for length = j-i goes from 2 to n do this:

DP[i,j] = if (DP[i,j] = 0) then 0 else 1 + min (DP[i, i + length/2], DP[i+length/2 + b, j])

Where b is 0 if length is even and 1 if length is odd.

Now the palindromic order of each substring has been computed. We simply take a histogram of this, and then do a postfix sum to compute all the desired counts.

The running time and space is O(n^2).

– Danny

Alternate DP solution:

In total there are 2n-1 possible “lines of symmetry” that a palindrome could have (either on a character, such as for “aba”, or between 2 characters, like in “bb”). We will make a table to store, for each line of symmetry, and then for each character in the string, what level palindrome (if any) is the substring starting on that character, with that line of symmetry.

As an example, for the string “baacaad”, and line of symmetry on character “c”, the possible substrings are “baacaad”, “aacaa”, “aca”, “c”, which have levels 0, 3, 2, 1, respectively.

To actually create this table, we process the string from left to right. For each new character, we consider all lines of symmetry to the left, and check whether our character mirrored around that line is equal to the corresponding character. If it isn’t, we flag that line of symmetry as “inactive”, since all future substrings centered around that line are guaranteed to not be palindromes as well. Otherwise, if we found a match (on an active line of symmetry), we look up the palindrome level of the left half of the substring in our table, add 1 to it, and store it under our new substring. (In the case where our total string is length 1, i.e. the line of symmetry lies on the character we are looking at, the left half just has length 0, so we can default to palindrome level 0).

This takes O(n^2) time and memory. - adam

E. Pattern Matching

Construct the following directed graph: each pattern in our list is a node, and there is an edge from node u to node v if v’s string pattern matches to u’s string. Now, it’s clear that what we want is a topological ordering of these nodes (such that the mt ordering condition holds), or a failure if one doesn’t exist. All that remains is constructing this graph… clearly trying every pattern match combination is O(n^2) and thus too slow; we need something faster.

For this, we use the k <= 4 condition. A pattern with w wildcards can only match at most 2^w other patterns. So, for each string, generate these 2^w “subpatterns” and determine their corresponding node in the graph, if there exists one (there are many ways to do this. I interpreted strings as base-27 numbers and used array access so this was only constant overhead). Now, we handle the mt condition: simply add an outgoing edge from mt[i] to every node that p[i] pattern matches to. This will take care of our ordering condition, and produce a cyclic graph if it’s impossible to satisfy. Once we have our graph, run a topological sort algorithm and we’re done.

--Aditya

Actually just using unordered_map with string works for E, but it doesn’t work for F. --Yan

F. DZY Loves Strings

First compute the list of positions of all substrings of length <= 4 in the string in a sorted way. Since length <= 4 we can just do this using the most naive way in O(n) time. Then, for each query, we take out the two sorted lists, and find the closest pair using two-pointers. This is a very classical problem, for each element in list a, just find the first element in list b that is larger than it. Then do the same thing again in reversed order. Computing this query to (a,b) takes O(#occurence of a + #occurrence of b) time, if both occurrences are not 0, and O(1) if any of the occurrences is 0. We can also use binary search to speed up here.

However, this naive approach might take a very long time, for example string is aaaaa….aaaa and all the queries are from a, aa, aaa, aaaa, so each query can take up to O(n) time. To solve this, we can use a table to save the queries that we have already seen. For the worst case running time, in order to make the time as long as possible we need the queries to be distinct, otherwise it’s just O(1). The worst case I can think of is we have t large values that sums up to n-q/t+t, and q/t-t small values, so total running time = nq/t - q^2/t^2 + q <= n^2/4 + q = 6e8 (honestly I think this is probably still incorrect, but anyways this cannot be that large in practice), which I’m not sure if this is achievable, but it is enough for us to pass. I give up on analyzing this but anyway we can pass.

P.S. You might need to convert the string to an int and use an array to store the result. Using an unordered_map with strings gives me TLE. To do this simply view the string as a base-27 integer.

--Yan

G. Death DBMS