#### DMCA

## Algorithmics on SLP-compressed strings: a survey, (2012)

Venue: | Groups Complex. Cryptol. |

Citations: | 10 - 3 self |

### Citations

4809 |
Introduction to automata theory, languages and computation, Addison-Wesley Pub
- Hopcroft, Ullman
- 1979
(Show Context)
Citation Context ...omputational model. Unless otherwise specified, we always refer to the random access model (RAM) with uniform cost measure. The latter means that arithmetic operations on arbitrary numbers can be carried out in constant time. This might seem unrealistic. On the other hand, all algorithms mentioned in this paper only have to store numbers with O(n) many bits, where n is the length of the input. We also assume some basic knowledge in automata theory. In particular, the reader should be familiar with regular languages and context-free languages. Background can be found in the classical text book [62]. 2.2 Straight-line programs Definition 1 (straight-line program (SLP)) A straight-line program (SLP) over the terminal alphabet Γ is a context-free grammar A = (V,Γ, S, P ) (V is the set of nonterminals, Γ is the set of terminals, S ∈ V is the initial nonterminal, and P ⊆ V × (V ∪ Γ)∗ is the set of productions) such that the following two conditions hold: (1) For every A ∈ V there exists exactly one production of the form (A,α) ∈ P for α ∈ (V ∪ Γ)∗. (2) The relation {(A,B) |(A,α) ∈ P,B ∈ alph(α)} is acyclic. A production (A,α) is also written as A → α. Clearly, the language generated by the S... |

1121 |
Algorithms on Strings, Trees, and Sequences
- Gusfield
- 1997
(Show Context)
Citation Context ...und for grammar-based compression is provided by the following results that was shown independently by Charikar et al. [23] and Rytter [119]: Theorem 8 ([23, 119]) There is an O(log(|Σ|) · n)-time algorithm that computes for a given word w ∈ Σ∗ of length n an SLP A such that eval(A) = w and |A |≤ O(log(n/opt(w)) · opt(w)) (i.e., the approximation ratio of the algorithm is O(log(n/opt(w)))). 6 The algorithms of Charikar et al. and Rytter follow a similar strategy. First, the LZ77-encoding of the input string w is computed. This is possible in time O(log(|Σ|) · |w|) using suffix trees, see e.g. [53]. Let m be the number of factors in the LZ77-factorization of w. By Theorem 5, we have m ≤ opt(w). In a second step, the LZ77-encoding of w is transformed into an SLP for w of size at most O(log(|w|/m) · m) ≤ O(log(|w|/opt(w)) · opt(w)) using Theorem 6. Rytter’s technique [119] can also be used to transform an SLP A for a string of length n in time O(|A |· log(n)) into an equivalent SLP of height O(log(n)), where the height of an SLP is the height of the corresponding derivation tree. Hence, there is a very efficient algorithm for making SLPs balanced. Let us remark that the existence of a pol... |

821 |
Fast pattern matching in strings
- Knuth, Morris, et al.
- 1977
(Show Context)
Citation Context ...e 3 shows how to construct A (in logspace). Now, it is an easy observation that dH(eval(A), eval(B)) is exactly the number of tuples x ∈ {0, 1} n such that x · (w1, . . . , wn) = t. 6 Compressed pattern matching A natural generalization of checking equality of two strings is pattern matching. In the classical pattern matching problem it is asked for given strings p (usually called the pattern) and t (usually called the text), whether p is a factor of t. There are dozens of linear time algorithms for this problem on uncompressed strings, most notably the well-known Knuth-Morris-Pratt algorithm [75]. It is therefore natural to ask, whether a polynomial time algorithm for pattern matching on SLP-compressed strings exists; this problem is sometimes called fully compressed pattern matching and is defined as follows: input: Two SLPs P and T question: Is eval(P) a factor of eval(T)? The first polynomial time algorithm for fully compressed pattern matching was presented in [73], further improvements with respect to the running time were achieved in [42, 67, 81, 106]. Theorem 12 ([42, 67, 73, 81, 106]) Fully compressed pattern matching can be solved in polynomial time. Basically, the algorithms... |

619 |
Combinatorial Group Theory
- Lyndon, Schupp
- 1977
(Show Context)
Citation Context ...oded number 1 ≤ i ≤ n. Some lower bounds on the preprocessing and query time for the compressed querying problem can be found in [24]. 11 Compressed word problems for groups In recent years, techniques for SLP-compressed strings were successfully applied in order to get more efficient algorithms for problems in combinatorial group theory. In this section, we briefly survey some of these results. We will restrict our consideration to the compressed word problem and its application to the word problem for automorphism groups. Background in combinatorial group theory can be found for instance in [96, 131]. Let G be a finitely generated group, and let Σ be a finite generating set for G. This means that every element of G can be written as a product of elements from Σ∪Σ−1. The word problem of G with respect to Σ is the following decision problem: input: A word w ∈ (Σ ∪ Σ−1)∗ question: Does w = 1 hold in the group G? 20 It is well known and easy to see that if Γ is another finite generating set for G, then the word problem for G with respect to Σ is logspace many-one reducible to the word problem for G with respect to Γ. This justifies to speak just of the word problem for the group G. The word p... |

321 |
Tree automata techniques and applications. Available on: http://www.grappa. univ-lille3.fr/tata. release 1.10.2002
- Comon, Dauchet, et al.
- 1997
(Show Context)
Citation Context ... given sets of equations in polynomial time, whether they define isomorphic regular words. The proof is based on a reduction to equality for SLP-compressed strings. 15 Extensions to trees and pictures The idea of grammar-based compression can be extended from strings to more complicated structures. In fact, all one needs is a set of operations on a domain D of objects, that construct “larger” D-objects from smaller ones. Using straight-line programs over D with these operations allows to construct “very large” D-objects. 15.1 Trees For background on trees, tree grammars, and tree automata see [27]. The concept of an SLP can easily be generalized from strings to trees. Here, trees are rooted node-labelled trees. Every node has a label from an alphabet Σ. Moreover, with every symbol a ∈ Σ a natural number (the rank of a) is associated. If a tree node v is labelled with a symbol of rank n, then v has exactly n children, which are linearly ordered. Such trees can conveniently be represented as terms. The size |t |of a tree t is the number of nodes of t. Here is an example: Example 35 Let f be a symbol of rank 2, h a symbol of rank 1, and a a symbol of rank 0 (a constant). Then the term h(f... |

270 | Bounded-width polynomial-size branching programs recognize exactly those languages in NC1
- Barrington
(Show Context)
Citation Context ...oblem asks whether a given circuit over M (i.e., a circuit, where all input gates are labelled with constants from M and all internal gates are labelled with the multiplication operation of M) evaluates to a given element of M . A circuit over M can be viewed as an SLP that generates 15 a word over the alphabet M . The second statement of Theorem 18 follows, since the set of all words over the alphabet M that evaluate to a particular element of M is a regular language. P-hardness of the circuit evaluation problem for a finite non-solvable group is shown in [10] using a technique of Barrington [9]. In this paper, Barrington proved that the word problem for every finite non-solvable group is complete for the circuit complexity class NC1. 9.2 Compressed membership for regular expressions When the regular language is not given by an automaton but by a regular expression, the complexity of the compressed membership problem depends on the operators that we allow in expressions. The same phenomenon is already well-known for uncompressed strings. For instance, whereas the membership problem for ordinary regular expressions (where only the Kleene-operators ·,∪, ∗ are allowed) is NL-complete [6... |

188 | Visibly Pushdown Languages
- Alur, Madhusudan
- 2004
(Show Context)
Citation Context ...ushdown does not decreases in every step, whereas in the second phase, the height of the pushdown does not increase in every step. Moreover, in [21] also a deterministic 1-counter language L (i.e., a context-free language that is accepted by a deterministic pushdown automaton with a unary pushdown alphabet) with PSPACE = LEAFlog(L) is constructed. Hence, there also exist fixed 1-turn deterministic context-free languages and deterministic 1-counter languages with a PSPACE-complete compressed membership problem. In [88] it was shown that even the important subclass of visibly pushdown languages [6] contains languages with a PSPACE-complete compressed membership problem. Let us first define visibly pushdown automata and the associated languages. Let Σc and Σr be two disjoint finite alphabets (call symbols and return symbols) and let Σ = Σc ∪ Σr. A visibly pushdown automaton (VPA) [6] over (Σc,Σr) is a tuple V = (Q, q0,Γ,⊥,∆, F ), where Q is a finite set of states, q0 ∈ Q is the initial state, F ⊆ Q is the set of final states, Γ is the finite set of stack symbols, ⊥ ∈ Γ is the initial stack symbol, and ∆ ⊆ (Q × Σc × Q × (Γ \ {⊥})) ∪ (Q × Σr × Γ × Q) is the set of transitions.3 A configura... |

186 |
The problem of solvability of equations in a free semigroup
- Makanin
- 1977
(Show Context)
Citation Context ...e e.g. [33]. Deciding whether a given word equation has a solution turned out to be very difficult. In the 1950’s, Markov conjectured that it is undecidable whether a given word equation has a solution. He knew that solvability of word equations can be reduced to Hilbert’s 10th problem, and his goal was to prove the undecidability of Hilbert’s 10th problem by proving undecidability of solvability of word equations. Whereas Hilbert’s 10th problem was finally shown to be undecidable by the seminal work of Matiyasevich from 1971, solvability of word equations was shown to be decidable by Makanin [98] in 1977. Makanin’s algorithm is very complicated, both with respect to its computational complexity and its technical difficulty. Over the years, the estimate on the time and space complexity of Makanin’s algorithm was gradually improved. Currently, the best known bound is EXPSPACE [54]. In 1999, Plandowski came up with an alternative algorithm for checking solvability of word equations, which put the problem into PSPACE [113]. This is still the best upper complexity bound. The best known lower bound is NP, and it was repeatedly conjectured that the precise complexity is NP too. Now, consider... |

153 |
Uniqueness theorems for periodic functions.
- Fine, Wilf
- 1965
(Show Context)
Citation Context ... |B|) for checking equality of SLP-compressed strings, which is currently the best upper bound. One can view the algorithms from [5, 105] as efficient methods for transforming an SLP A into a canonical SLP A such that eval(A) = eval(B) if and only if A′ and B′ are identical. One should note that the machine model in [5] is a RAM that allows to compute bitwise XOR and AND of machine words in constant time. In contrast to [105], the polynomial time algorithms of Hirshfeld et al. [61] and Plandowski [112] use combinatorial properties of words, in particular the periodicity lemma of Fine and Wilf [39]. This lemma states that if p and q are periods of a string w (i.e., w[i] = w[i+p] and w[j] = w[j +q] for all positions 1 ≤ i ≤ |w |− p and 1 ≤ j ≤ |w |− q) and p + q ≤ |w |then also the greatest common divisor of p and q is a period of w. The algorithm from [61] achieves a running time of O(n4), where n = |A |+ |B|. Both, Plandowski and Hirshfeld et al. use Theorem 9 as a tool to solve another problem. Plandowski derives from Theorem 9 a polynomial time algorithm for testing whether two given morphisms (between free monoids) agree on a given context-free language. Hirshfeld et al. use Theorem... |

127 |
Morse theory and finiteness properties of groups
- Bestvina, Brady
- 1997
(Show Context)
Citation Context ...time [58]. In all these results, polynomial time can be replaced by any complexity class that is closed under polynomial time Turing-reductions. Results (1)–(5) yield polynomial time algorithms for a quite large class of groups. First of all, (5) implies by taking G1 = G2 = · · · = Gn = Z that the compressed word problems for rightangled Artin groups (which are also known as graph groups or free partially commutative groups) 22 can be solved in polynomial time. Due to their rich subgroup structure, right-angled Artin groups received a lot of attention in group theory during the last few years [14, 30, 46]. With (1) and (5) it follows that for every group G that virtually embeds into a right-angled Artin group (i.e., G has a finite index subgroup that embeds into a right-angled Artin group) the compressed word problem can be solved in polynomial time. Groups that virtually embed into a right-angled Artin group are also known as virtually special. Recently, it has been shown that virtually special groups form a quite large class of groups. For instance, every Coxeter group is virtually special [56] and every fully residually free group is virtually special [141].6 It follows that for every Coxet... |

120 | Offline dictionary-based compression, in:
- Larsson, Moffat
- 1999
(Show Context)
Citation Context .... , km of binary encoded natural numbers is given, and one is looking for a smallest circuit over (N,+) such that every ki is the value of some gate. Currently the best approximation algorithm for this problem is by Yao [143] (from 1976) and its approximation ratio is O(log n/ log log n), where n = k1 + · · · + km. Some optimizations of Rytter’s algorithm can be found in [17]. Another grammar-based compression algorithm with linear running time (for a fixed alphabet) and an approximation ratio of O(log(n/opt(w))) was presented by Sakamoto [121]. His algorithm is based on the RePair compressor [78], which is another popular grammar-based compressor. The algorithms from [23, 119, 121] all use space Ω(n) in order to generate an SLP for a string of length n, which might be a problem for large input texts. A grammar-based compression algorithm with an approximation ratio of O(log2(n)) that runs in linear time and needs no more space than the size of the output grammar was recently presented in [101]. Grammar-based compression using a weak form of context-sensitive grammars (so called Σ- sensitive grammars) is discussed in [102]. 5 Compressed equality checking The most basic task for SLP-com... |

114 | Let sleeping files lie: pattern matching in Zcompressed files.
- Amir, Benson, et al.
- 1996
(Show Context)
Citation Context ...he pattern and the text are compressed. In practical applications, the pattern is usually short in comparison to the text. In such a situation, it makes sense the consider the weaker variant, where only the text is compressed, but the pattern is given explicitly. This leads to the semi-compressed pattern matching problem (sometimes just called compressed pattern matching problem), where it is asked whether an uncompressed pattern p is a factor of a compressed text eval(T). Semicompressed pattern matching has been studied intensively for various compression schemes, e.g. LZW (Lempel–Ziv-Welch) [7, 44] and LZ1 (a variant of LZ77 as defined in Section 3) [38, 45]. For SLP-compressed texts it is easy to show that compressed pattern matching can be solved in time O(|p|·|T|). Assume that the text SLP T is in Chomsky normal form. As remarked earlier, it suffices to check for each nonterminal X of T, whether there exists an occurrence of p in evalT(X) that touches the cut of X. For this, one computes for every nonterminal X the prefix prefX of evalT(X) of length min{|p|, |evalT(X)|} as well as the suffix suffX of evalT(X) of length min{|p|, |evalT(X)|}. This is possible in time O(|p |· |T|) by a ... |

106 |
String matching in Lempel-Ziv compressed strings.
- Farach, Thorup
- 1998
(Show Context)
Citation Context ...tions, the pattern is usually short in comparison to the text. In such a situation, it makes sense the consider the weaker variant, where only the text is compressed, but the pattern is given explicitly. This leads to the semi-compressed pattern matching problem (sometimes just called compressed pattern matching problem), where it is asked whether an uncompressed pattern p is a factor of a compressed text eval(T). Semicompressed pattern matching has been studied intensively for various compression schemes, e.g. LZW (Lempel–Ziv-Welch) [7, 44] and LZ1 (a variant of LZ77 as defined in Section 3) [38, 45]. For SLP-compressed texts it is easy to show that compressed pattern matching can be solved in time O(|p|·|T|). Assume that the text SLP T is in Chomsky normal form. As remarked earlier, it suffices to check for each nonterminal X of T, whether there exists an occurrence of p in evalT(X) that touches the cut of X. For this, one computes for every nonterminal X the prefix prefX of evalT(X) of length min{|p|, |evalT(X)|} as well as the suffix suffX of evalT(X) of length min{|p|, |evalT(X)|}. This is possible in time O(|p |· |T|) by a straightforward bottom-up computation. Then, one only has to ... |

104 |
Word processing in groups, Jones and Bartlett
- Epstein, Cannon, et al.
- 1992
(Show Context)
Citation Context ...r G with respect to Γ. This justifies to speak just of the word problem for the group G. The word problem was introduced in the pioneering work of Dehn from 1910 [32] in relation with topological questions. Novikov [109] and independently Boone [16] constructed examples of finitely presented groups (i.e., finitely generated groups with only finitely many defining relations) with an undecidable word problem. Despite this negative result, many natural classes of groups with decidable word problems were found. Prominent examples are for instance finitely generated linear groups, automatic groups [36], and one-relator groups. With the advent of computational complexity theory, also the complexity of word problems became an active research area. For instance, it was shown that for a finitely generated linear group the word problem can be solved in logarithmic space [84, 130], that automatic groups have polynomial time solvable (in fact quadratic) word problems [36], and that the word problem for a one-relator group is primitive recursive [19]. For a group G let Aut(G) be the automorphism group of G. In general, Aut(G) is not necessarily finitely generated even if G is finitely generated. Fo... |

104 |
Lower bounds for natural proof systems.
- Kozen
- 1977
(Show Context)
Citation Context ...wn in [86] by a reduction from the quantified boolean satisfiability problem; the reduction is mainly taken from [99], where it was shown that model checking the temporal logic LTL on SLP-compressed strings is PSPACE-complete. PSPACE-hardness for {·,∪, 2} and {·,∪, ∗,∩} is shown by generic reductions from the acceptance problem for a polynomial space bounded machine. Finally, the PSPACE lower bound for {·,∪, ∗, ||} is proved by a reduction from the intersection nonemptiness problem for regular expressions (over the standard operator set {·,∪, ∗}), which is a well-known PSPACE-complete problem [77]. The ATIME(exp(n), O(n))- hardness part from (4) uses a corresponding result from [89] on the model-checking problem for monadic second-order logic over SLP-compressed words over a unary alphabet. 9.3 Compressed finite automata A compressed nondeterministic finite automaton (CNFA for short) is a tuple A = (Q,Σ, δ, q0, F ), where Q is a finite set of states, Σ is a finite alphabet, q0 ∈ Q is the initial state, F ⊆ Q is the set of final states, and δ is a finite set of transitions of the form (p, A, q), where p and q are states and A is an SLP over Σ. The size of A is |A |= |Q|+ ∑ (p,A,q)∈δ |A|... |

88 | Two-dimensional languages.
- Giammarresi, Restivo
- 1997
(Show Context)
Citation Context ...the input TSLP A into a TSLP B such that eval(A) = eval(B) and every nonterminal of B has rank at most one. This is the difficult step in [92]. (2) Evaluate the input tree automaton A on eval(B) in polynomial time, using the fact that every nonterminal of B has rank at most one, see [90]. Several other algorithmic problems for TSLP-compressed input trees are studied in [20, 41, 80, 127, 128]. 15.2 Pictures A picture over the alphabet Σ is a mapping p : {1, . . . , n} × {1, . . . ,m} → Σ for some n,m ≥ 1. We say that (n,m) is the dimension of the picture p. For more information on pictures see [47]. 29 A picture can be viewed as a 2-dimensional word. We define two partial concatenation operations for pictures. Given two pictures p1 and p2, where p1 has dimension (n,m) and p2 has dimension (n, k), we define the horizontal concatenation (p1, p2) : {1, . . . , n}×{1, . . . ,m+ k} → Σ (a picture of dimension (n,m + k)) as follows, where x ∈ {1, . . . , n} and y ∈ {1, . . . ,m + k}: (p1, p2)(x, y) = { p1(x, y) if y ≤ m p2(x, y − m) if y > m Given two pictures p1 and p2, where p1 has dimension (m,n) and p2 has dimension (k, n), we define the vertical concatenation ( p2 p1 ) : {1, . . . ,m + k... |

76 | A subquadratic sequence alignment algorithm for unrestricted cost matrices, in:
- Crochemore, Landau, et al.
- 2002
(Show Context)
Citation Context ... group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algorithms on compressed strings, we have to make precise the compressed representation we want to use. Such a representation should have two properties: (i) it should cover many compression schemes from practice and (ii) it should be mathematically easy to handle. Straight-line programs (SLPs) have both of these properties. An SLP is a context-free grammar that generates exactly one word. In an SLP, repeated subpatterns in a string have to be represented only once by introducing a non-terminal for the pattern. It is not difficult to see that with an SLP 1 consisting of n ... |

74 | The word problem.
- Boone
- 1959
(Show Context)
Citation Context ...of elements from Σ∪Σ−1. The word problem of G with respect to Σ is the following decision problem: input: A word w ∈ (Σ ∪ Σ−1)∗ question: Does w = 1 hold in the group G? 20 It is well known and easy to see that if Γ is another finite generating set for G, then the word problem for G with respect to Σ is logspace many-one reducible to the word problem for G with respect to Γ. This justifies to speak just of the word problem for the group G. The word problem was introduced in the pioneering work of Dehn from 1910 [32] in relation with topological questions. Novikov [109] and independently Boone [16] constructed examples of finitely presented groups (i.e., finitely generated groups with only finitely many defining relations) with an undecidable word problem. Despite this negative result, many natural classes of groups with decidable word problems were found. Prominent examples are for instance finitely generated linear groups, automatic groups [36], and one-relator groups. With the advent of computational complexity theory, also the complexity of word problems became an active research area. For instance, it was shown that for a finitely generated linear group the word problem can be solv... |

73 | On the complexity of numerical analysis
- Allender, Bürgisser, et al.
- 2006
(Show Context)
Citation Context ...ry node of of indegree n ≥ 1 is labelled with an operation fi of arity n, • the incoming edges for a node are linearly ordered, and • there is a unique node of outdegree 0 (the output node of D). Such a circuit computes an element of A. An SLP over the terminal alphabet Γ is nothing else than a circuit over the free monoid Γ∗, where the only operation is concatenation and the constants are the symbols from the alphabet Γ. It should be noted that in the literature, the term “straight-line program” often refers to circuits over the ring of integers or, more generally, polynomial rings, see e.g. [3, 63, 70]. For some applications, an extension of SLPs called composition systems in [43] are useful.1 A composition system A is defined as an SLP but may also contain productions of the form X → Y [i, j] for i, j ∈ N with 1 ≤ i ≤ j. Assume that the string w = evalA(Y ) is already defined. Then we let evalA(X) = w[i,max{|w|, j}]. A composition system is in Chomsky normal form if all productions have the form X → a, X → Y Z or X → Y [i, j] for nonterminals X, Y , and Z and a terminal symbol a. The following result was shown by Hagenah in his PhD thesis [55] (in German), see also [125]. Theorem 4 From a ... |

67 | Generic-case complexity, decision problems in group theory, and random walks
- Kapovich, Miasnikov, et al.
- 2003
(Show Context)
Citation Context ... of evalB(Z ′), i.e., we compute the maximal number n such that (evalB(Y ′)[m − n + 1,m])−1 = a−1m a −1 m−1 · · · a −1 m−n+1 = b1b2 · · · bn = (evalB(Z ′))[1, n] (1) For a particular n we can check equation (1) in polynomial time by the extension of Theorem 9 to composition systems (which holds by Theorem 4). Now, the maximal n satisfying (1) can be computed in polynomial time as well using binary search. Finally, we add to the composition system B the production X ′ → Y ′[1,m − n]Z ′[n + 1, k]. A direct corollary of Theorem 26 and 28 is the following result, which solves an open problem from [71]. Corollary 29 ([125]) Let F be a finitely generated free group. The word problem for Aut(F ) (the automorphism group of F , which is finitely generated) can be solved in polynomial time. Several closure properties for the complexity of the compressed word problem where shown: (1) If H is a finitely generated subgroup of G and the compressed word problem for G can be solved in polynomial time, then the same holds for H [95]. (2) If G is a finite-index subgroup of H and the compressed word problem for G can be solved in polynomial time, then the same holds for H [95]. (3) Let A and B be finite ... |

62 | The smallest grammar problem.
- Charikar, Lehman, et al.
- 2005
(Show Context)
Citation Context ...ime an SLP for the string, and, vice versa, from an SLP for a string one can compute in polynomial time the LZ77-encoding of the string [115]. This implies that complexity results can be transfered from SLP-encoded input strings to LZ77-encoded input strings. We will discuss the relationship between LZ77 and SLPs in more detail in Section 3. The problem of computing a small SLP for a given input string is known as grammar-based compression; it will be briefly discussed in Section 4. Although computing a size-minimal SLP for a given input string is not possible in polynomial time unless P = NP [23], there are several algorithms that compute for a given input string of length n an SLP that is only by a factor log(n) larger than a minimal grammar [23, 119, 121]. In Sections 5–10 we consider algorithmic problems for SLP-compressed strings. The following types of algorithmic problems on compressed strings will be studied: • Comparing compressed strings (Section 5): Are two compressed strings equal or similar in a certain sense? • Pattern matching for compressed strings (Section 6 and 8): Does a compressed string occur (in a certain sense) as a pattern in another compressed string? • Members... |

60 |
A uniform method for proving lower bounds on the computational complexity of logical theories.
- Compton, Henson
- 1990
(Show Context)
Citation Context ...ssions). The next theorem gives a rather complete picture on the complexity of compressed membership problems for regular expressions. It characterizes the complexity for all operator sets {·,∪} ⊆ C ⊆ {·,∪, ∗,∩,¬, ||, 2}. The class ATIME(exp(n), O(n)) denotes the class of all problems that can be solved on an alternating Turing-machine in exponential time, but the number of alternations (i.e., transitions between an existential and a universal state or vice versa) has to be bounded by O(n) on every computation path. Completeness results for ATIME(exp(n), O(n)) are typical for logical theories [28], but we are not aware of any other completeness results for this class in formal language theory. Theorem 19 ([86]) Let {·,∪} ⊆ C ⊆ {·,∪, ∗,∩,¬, ||, 2}. The compressed membership problem for C is (1) in P if C ⊆ {·,∪,∩} or C ⊆ {·,∪, ∗}, (2) NP-complete if {·,∪, ||} ⊆ C ⊆ {·,∪,∩, ||}, (3) PSPACE-complete if (i) ¬ 6∈ C or ||6∈ C and (ii) C contains {·,∪,¬} or {·,∪, 2} or {·,∪, ∗,∩} or {·,∪, ∗, ||}, and (4) ATIME(exp(n), O(n))-complete if {·,∪,¬, ||} ⊆ C. 16 The first statement (membership in P for C ⊆ {·,∪,∩} or C ⊆ {·,∪, ∗}) is clear: The case C ⊆ {·,∪, ∗} reduces to the uniform compressed mem... |

58 | A polynomial algorithm for deciding bisimilarity of normed context-free processes.
- Hirshfeld, Jerrum, et al.
- 1996
(Show Context)
Citation Context ...ime and space for (de)compression. Such a scenario can be found for instance in large genom databases or XML processing. • Large and often highly compressible strings may appear as intermediate data structures in algorithms. Here, one may try to store a compressed representation of these intermediate data structures and to process this representation. This may lead to more efficient algorithms. Examples for this strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algorithms on compressed strings, we have to make precise the compressed representation we want to use. Such a r... |

55 | Greatest common divisors of polynomials given by straight-line programs.
- Kaltofen
- 1988
(Show Context)
Citation Context ...ry node of of indegree n ≥ 1 is labelled with an operation fi of arity n, • the incoming edges for a node are linearly ordered, and • there is a unique node of outdegree 0 (the output node of D). Such a circuit computes an element of A. An SLP over the terminal alphabet Γ is nothing else than a circuit over the free monoid Γ∗, where the only operation is concatenation and the constants are the symbols from the alphabet Γ. It should be noted that in the literature, the term “straight-line program” often refers to circuits over the ring of integers or, more generally, polynomial rings, see e.g. [3, 63, 70]. For some applications, an extension of SLPs called composition systems in [43] are useful.1 A composition system A is defined as an SLP but may also contain productions of the form X → Y [i, j] for i, j ∈ N with 1 ≤ i ≤ j. Assume that the string w = evalA(Y ) is already defined. Then we let evalA(X) = w[i,max{|w|, j}]. A composition system is in Chomsky normal form if all productions have the form X → a, X → Y Z or X → Y [i, j] for nonterminals X, Y , and Z and a terminal symbol a. The following result was shown by Hagenah in his PhD thesis [55] (in German), see also [125]. Theorem 4 From a ... |

53 | Embeddings of graph braid and surface groups in rightangled Artin groups and braid groups, Algebr. Geom
- Crisp, Wiest
(Show Context)
Citation Context ...time [58]. In all these results, polynomial time can be replaced by any complexity class that is closed under polynomial time Turing-reductions. Results (1)–(5) yield polynomial time algorithms for a quite large class of groups. First of all, (5) implies by taking G1 = G2 = · · · = Gn = Z that the compressed word problems for rightangled Artin groups (which are also known as graph groups or free partially commutative groups) 22 can be solved in polynomial time. Due to their rich subgroup structure, right-angled Artin groups received a lot of attention in group theory during the last few years [14, 30, 46]. With (1) and (5) it follows that for every group G that virtually embeds into a right-angled Artin group (i.e., G has a finite index subgroup that embeds into a right-angled Artin group) the compressed word problem can be solved in polynomial time. Groups that virtually embed into a right-angled Artin group are also known as virtually special. Recently, it has been shown that virtually special groups form a quite large class of groups. For instance, every Coxeter group is virtually special [56] and every fully residually free group is virtually special [141].6 It follows that for every Coxet... |

50 | Efficient memory representation of XML document trees
- Busatto, Lohrey, et al.
- 2008
(Show Context)
Citation Context ...a) Ai(y1) → Ai+1(Ai+1(y1)) for 0 ≤ i < n An(y1) → f(y1, y1) Then eval(An) is a complete binary tree of height 2 n. Thus, |eval(An) |= 2 · 2 2n − 1. Example 37 motivates the definition of linear TSLPs. A TSLP A = (N,S, P ) is linear if for every rule (A(y1, . . . , yn) → t) ∈ P , each of the parameters y1, . . . , yn occurs exactly once in the tree t. The TSLP from Example 36 is linear. In contrast to non-linear TSLPs, for a linear TSLP A the size of tree eval(A) is bounded by 2O(|A|). Efficient algorithms that generate for a given input tree t a linear TSLP A with eval(A) = t are described in [18, 91] and [2] (for a slightly different type of grammars). A TSLP is nothing else than a context-free tree grammar that generates a single tree. An extension of TSLPs to higher order tree grammars was recently proposed in [76]. Let us now consider algorithmic problems for TSLP-compressed trees. Concerning equality checking, we have the following: 28 Theorem 38 ([18, 126]) For two given TSLPs A and B it can be checked in PSPACE, whether eval(A) = eval(B). For two given linear TSLPs A and B it can be checked in P, whether eval(A) = eval(B). For the PSPACE-bound in the first statement, one has to noti... |

48 | Probabilistic algorithms for deciding equivalence of straight-line programs.
- Ibarra, Moran
- 1983
(Show Context)
Citation Context ...ry node of of indegree n ≥ 1 is labelled with an operation fi of arity n, • the incoming edges for a node are linearly ordered, and • there is a unique node of outdegree 0 (the output node of D). Such a circuit computes an element of A. An SLP over the terminal alphabet Γ is nothing else than a circuit over the free monoid Γ∗, where the only operation is concatenation and the constants are the symbols from the alphabet Γ. It should be noted that in the literature, the term “straight-line program” often refers to circuits over the ring of integers or, more generally, polynomial rings, see e.g. [3, 63, 70]. For some applications, an extension of SLPs called composition systems in [43] are useful.1 A composition system A is defined as an SLP but may also contain productions of the form X → Y [i, j] for i, j ∈ N with 1 ≤ i ≤ j. Assume that the string w = evalA(Y ) is already defined. Then we let evalA(X) = w[i,max{|w|, j}]. A composition system is in Chomsky normal form if all productions have the form X → a, X → Y Z or X → Y [i, j] for nonterminals X, Y , and Z and a terminal symbol a. The following result was shown by Hagenah in his PhD thesis [55] (in German), see also [125]. Theorem 4 From a ... |

43 |
On the equivalence, containment, and covering problems for the regular and context-free languages.
- Rosenkrantz, Szymanski
- 1976
(Show Context)
Citation Context ...Ps and general context-free grammars are acyclic context-free grammars (also known as non-recursive context-free grammars). These are context-free grammars such that from a nonterminal A one cannot derive in ≥ 1 steps a word containing A. In other words, we only keep condition (2) in Definition 1. Acyclic context-free grammars generate only finite sets of words. Let G be an acyclic context-free grammar of size n. Then, every word generated by G has length 2O(n), and |L(G) |≤ 22 O(n) . Unfortunately, Theorem 9 cannot be generalized to acyclic context-free grammars, as the following result from [64] shows: Theorem 10 ([64]) It is NEXPTIME-complete to check for two given acyclic context-free grammars G1 and G2, whether L(G1) 6= L(G2). In [108], it was shown that the problem of checking L(G1) ∩ L(G2) = ∅ for two given acyclic context-free grammars G1 and G2 is PSPACE-complete. Let us finally mention a related result of Lifshits [82], which when compared with Theorem 9 shows again the quite subtle borderline between tractability and intractability for problems on compressed strings. A function f : Σ∗ → N belongs to #P if there exists a nondeterministic polynomial time bounded Turing-machine... |

40 |
Nondeterministic NC1computation.
- Caussinus, McKenzie, et al.
- 1998
(Show Context)
Citation Context ... This algorithm can be used to check for a given context-free grammar G in Chomsky normal form and an SLP A in DSPACE(log2(|G |+ 2|A|)), i.e., in polynomial space, whether eval(A) ∈ L(G). In [85], a fixed deterministic context-free language with a PSPACE-complete compressed membership problem was constructed using a generic reduction from the acceptance problem for polynomial space machines. Alternatively, the existence of such a language can also be deduced from Corollary 17 and the following result from [65]: There is a deterministic context-free language L such that PSPACE = LEAFlog(L). In [21], this result was sharpened: There even exists a 1-turn deterministic context-free language L such that PSPACE = LEAFlog(L). A deterministic pushdown automaton P is 1-turn, if for every input, the unique computation of P can be divided into two phases: In a first phase, the height of the pushdown does not decreases in every step, whereas in the second phase, the height of the pushdown does not increase in every step. Moreover, in [21] also a deterministic 1-counter language L (i.e., a context-free language that is accepted by a deterministic pushdown automaton with a unary pushdown alphabet) w... |

34 |
On the power of polynomial time bit-reductions.
- Hertrampf, Lautemann, et al.
- 1993
(Show Context)
Citation Context ...ship problem decreases to the circuit complexity class NC1 [35] and is therefore of the same complexity as for regular languages [9] (complexity theorists conjecture that NC1 is a proper subclass of LOGCFL). In contrast to this, by the following theorem, compressed membership is in general PSPACE-complete even for linear visibly pushdown languages, whereas it is P-complete for regular languages (Theorem 18): Theorem 23 ([88]) There is a fixed 1-turn visibly pushdown language with a PSPACE-complete compressed membership problem. This result can be deduced from Theorem 14. By the main result of [60] there exists a regular language K ⊆ {0, 1}∗ such that for every language L in PSPACE there exists a fully balanced adequate polynomial time Turing-machine M with the following properties: • There is a polynomial p(n) such that for every input w, every maximal path in the computation tree TM (w) has exactly p(|w|) many branching nodes. • For every input w, w ∈ L if and only if leaf(M,w) ∈ K.4 Hence, by Theorem 14, the following problem is PSPACE-hard: input: Two SLPs A and B over the terminal alphabet {0, 1} such that |eval(A) |= |eval(B)|. question: Does eval(A) ⊗ eval(B) ∈ ρ−1(K) hold? Since... |

32 |
The hardest context-free language.
- Greibach
- 1973
(Show Context)
Citation Context ...ata and their languages have many decidability results and closure results that do not hold for general context-free languages. For instance, for every VPA V there exists a deterministic VPA V ′ with L(V ) = L(V ′), the class of visibly pushdown languages is effectively closed under boolean operations, and equivalence, inclusion, and intersection non-emptiness are decidable for VPAs, see [6] where also precise complexity results are shown. Moreover, also the membership problem seems to be easier for visibly pushdown languages than for general context-free languages. By a classical result from [51], there exists a context-free language with a LOGCFL-complete membership problem. For visibly pushdown languages the complexity of the membership problem decreases to the circuit complexity class NC1 [35] and is therefore of the same complexity as for regular languages [9] (complexity theorists conjecture that NC1 is a proper subclass of LOGCFL). In contrast to this, by the following theorem, compressed membership is in general PSPACE-complete even for linear visibly pushdown languages, whereas it is P-complete for regular languages (Theorem 18): Theorem 23 ([88]) There is a fixed 1-turn visib... |

32 | Pattern-matching for strings with short description
- Karpinski, Rytter, et al.
- 1995
(Show Context)
Citation Context ...led the pattern) and t (usually called the text), whether p is a factor of t. There are dozens of linear time algorithms for this problem on uncompressed strings, most notably the well-known Knuth-Morris-Pratt algorithm [75]. It is therefore natural to ask, whether a polynomial time algorithm for pattern matching on SLP-compressed strings exists; this problem is sometimes called fully compressed pattern matching and is defined as follows: input: Two SLPs P and T question: Is eval(P) a factor of eval(T)? The first polynomial time algorithm for fully compressed pattern matching was presented in [73], further improvements with respect to the running time were achieved in [42, 67, 81, 106]. Theorem 12 ([42, 67, 73, 81, 106]) Fully compressed pattern matching can be solved in polynomial time. Basically, the algorithms from [42, 67, 73, 81, 106] can be divided into two classes: Those that are based on the periodicity lemma of Fine and Wilf [39] mentioned in the previous section [42, 73, 81, 106] and Jez’s solution from [67] that does not use any non-trivial combinatorial properties of words. An important observation that is crucial in [42, 67, 73, 81, 106] is the obvious fact that if a patte... |

32 | Processing compressed texts: A tractability border
- Lifshits
- 2007
(Show Context)
Citation Context ... There are dozens of linear time algorithms for this problem on uncompressed strings, most notably the well-known Knuth-Morris-Pratt algorithm [75]. It is therefore natural to ask, whether a polynomial time algorithm for pattern matching on SLP-compressed strings exists; this problem is sometimes called fully compressed pattern matching and is defined as follows: input: Two SLPs P and T question: Is eval(P) a factor of eval(T)? The first polynomial time algorithm for fully compressed pattern matching was presented in [73], further improvements with respect to the running time were achieved in [42, 67, 81, 106]. Theorem 12 ([42, 67, 73, 81, 106]) Fully compressed pattern matching can be solved in polynomial time. Basically, the algorithms from [42, 67, 73, 81, 106] can be divided into two classes: Those that are based on the periodicity lemma of Fine and Wilf [39] mentioned in the previous section [42, 73, 81, 106] and Jez’s solution from [67] that does not use any non-trivial combinatorial properties of words. An important observation that is crucial in [42, 67, 73, 81, 106] is the obvious fact that if a pattern p is a factor of eval(T) then there exists a production X → Y Z in eval(T) such that p ... |

31 | Random access to grammar-compressed strings.
- Bille, Landau, et al.
- 2011
(Show Context)
Citation Context ...ed pattern matching for LZ1 [45] first transforms the LZ1-encoding of the text into an α-balanced SLP using the method from [23] (see Theorem 6) and then runs an algorithm for semi-compressed pattern matching for α-balanced SLPs. Several other algorithmic problems that are related to compressed pattern matching were studied. In [25, 26, 40], self-indexes for SLP-compressed strings are constructed. The goal of such a self-index is to store an SLP A with small space overhead in such a form that for a given uncompressed pattern p one can quickly list all occurrences of p in eval(A). Bille et al. [15] present an efficient algorithm for approximative compressed pattern matching for SLP-compressed strings: 10 Here, for a given pattern p, SLP T, and k ∈ N the goal is to find all occurrences of factors of eval(T) that have distance at most k from p with respect to a certain distance measure (e.g. Hamming distance or edit distance). The problem of computing q-gram frequencies for SLP-compressed strings is studied in [49, 50]. A q-gram is just a string of length q. An algorithm that computes in time O(q · n) for a Chomsky normal form SLP A of size n a list with the frequencies of all q-grams occ... |

31 |
Word problems solvable in logspace.
- Lipton, Zalcstein
- 1977
(Show Context)
Citation Context ... finitely presented groups (i.e., finitely generated groups with only finitely many defining relations) with an undecidable word problem. Despite this negative result, many natural classes of groups with decidable word problems were found. Prominent examples are for instance finitely generated linear groups, automatic groups [36], and one-relator groups. With the advent of computational complexity theory, also the complexity of word problems became an active research area. For instance, it was shown that for a finitely generated linear group the word problem can be solved in logarithmic space [84, 130], that automatic groups have polynomial time solvable (in fact quadratic) word problems [36], and that the word problem for a one-relator group is primitive recursive [19]. For a group G let Aut(G) be the automorphism group of G. In general, Aut(G) is not necessarily finitely generated even if G is finitely generated. For the investigation of the complexity of the word problem of a finitely generated subgroup of Aut(G), a compressed variant of the word problem for G turned out to be useful. Assume again that G is finitely generated by Σ. The compressed word problem of G with respect to Σ is th... |

28 | Computing procedure summaries for interprocedural analysis
- Gulwani, Tiwari
- 2007
(Show Context)
Citation Context ...sentation in order to save the time and space for (de)compression. Such a scenario can be found for instance in large genom databases or XML processing. • Large and often highly compressible strings may appear as intermediate data structures in algorithms. Here, one may try to store a compressed representation of these intermediate data structures and to process this representation. This may lead to more efficient algorithms. Examples for this strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algorithms on compressed strings, we have to make precise the compressed rep... |

26 | On the complexity of pattern matching for highly compressed two-dimensional texts
- Berman, Karpinski, et al.
- 2002
(Show Context)
Citation Context ...imensions match. Define the dimension of a nonterminal X inductively as follows. If the unique production for X has the form X → a, then the dimension of X is (1, 1). If the unique production for X has the form X → (Y,Z), then, inductively the dimension (n,m) (resp., (k, ℓ)) of Y (resp., Z) is already defined. We require that n = k and define the dimension of X as (n,m+ ℓ). The constraint for a production X → ( Z Y ) is analogous. If the dimension of the initial nonterminal S is (n,m), then the 2SLP A defines a picture eval(A) of dimension (n,m). Algorithmic problems for 2SLPs were studied in [11]. For equality checking, the following was shown; recall the definition of the complexity class coRP from Section 11. Theorem 40 ([11]) The problem of checking eval(A) = eval(B) for two given 2SLPs belongs to the complexity class coRP. For the proof, one encodes a picture p of dimension (m,n) over the alphabet {1, . . . , k} as the polynomial polyp(x, y) = ∑ 1≤i≤m ∑ 1≤j≤n p(i, j)xi−1yj−1. For two 2SLPs A and B, we have eval(A) = eval(B) if and only if polyeval(A)(x, y)−polyeval(B)(x, y) is the zero polynomial. Now, from a given 2SLP A one can construct in polynomial time a circuit over the pol... |

26 |
A note on the space complexity of some decision problems for finite automata.
- Jiang, Ravikumar
- 1991
(Show Context)
Citation Context ...9]. In this paper, Barrington proved that the word problem for every finite non-solvable group is complete for the circuit complexity class NC1. 9.2 Compressed membership for regular expressions When the regular language is not given by an automaton but by a regular expression, the complexity of the compressed membership problem depends on the operators that we allow in expressions. The same phenomenon is already well-known for uncompressed strings. For instance, whereas the membership problem for ordinary regular expressions (where only the Kleene-operators ·,∪, ∗ are allowed) is NL-complete [69], membership for semi-extended regular expressions (where in addition also intersection is allowed) is LOGCFL-complete [111]. Let us be a bit more formal: For a set C of language operations, C-expressions over the alphabet Σ are built up from constants in Σ ∪ {ε} using the operations from C. Thus, ordinary regular expressions are {·,∪, ∗}-expressions. The language defined by ρ is L(ρ). The length |ρ |of an expression is defined as follows: For ρ ∈ Σ ∪ {ε} set |ρ |= 1. If ρ = op(ρ1, . . . , ρn), where op is an n-ary operator, we set |ρ |= 1+ |ρ1|+ · · ·+ |ρn|. For a set C of language operations... |

25 | The geometry and topology of reconfiguration.
- Ghrist, Peterson
- 2007
(Show Context)
Citation Context ...time [58]. In all these results, polynomial time can be replaced by any complexity class that is closed under polynomial time Turing-reductions. Results (1)–(5) yield polynomial time algorithms for a quite large class of groups. First of all, (5) implies by taking G1 = G2 = · · · = Gn = Z that the compressed word problems for rightangled Artin groups (which are also known as graph groups or free partially commutative groups) 22 can be solved in polynomial time. Due to their rich subgroup structure, right-angled Artin groups received a lot of attention in group theory during the last few years [14, 30, 46]. With (1) and (5) it follows that for every group G that virtually embeds into a right-angled Artin group (i.e., G has a finite index subgroup that embeds into a right-angled Artin group) the compressed word problem can be solved in polynomial time. Groups that virtually embed into a right-angled Artin group are also known as virtually special. Recently, it has been shown that virtually special groups form a quite large class of groups. For instance, every Coxeter group is virtually special [56] and every fully residually free group is virtually special [141].6 It follows that for every Coxet... |

24 | Logspace and logtime leaf languages.
- Jenner, McKenzie, et al.
- 1996
(Show Context)
Citation Context ... class LEAFp(L) if there exists an adequate nondeterministic polynomial time Turing-machine M with input alphabet Σ such that for every word w ∈ Σ∗ we have: w ∈ K if and only if leaf(M,w) ∈ L. In this way, many complexity classes can be defined in a uniform way. Here are a few examples: • NP = LEAFp({0, 1}∗1{0, 1}∗) • coNP = LEAFp(1∗) • PP = LEAFp({x ∈ {0, 1}∗ ||x|1 > |x|0}) (PP stands for probabilistic polynomial time. 2) Of course, the leaf language concept is not restricted to nondeterministic polynomial time Turingmachines. In the context of SLP-compression, logspace leaf language classes [65] turned out to be useful: A language K ⊆ Σ∗ belongs to the class LEAFlog(L) if there exists an adequate 2If we view a nondeterministic Turing machine as a probabilistic machine that chooses successor configurations with uniform distribution, then PP can be defined as the class of all languages L for which there exists a probabilistic machine M such that for all inputs x: x ∈ L if and only if M accepts x with probability > 1/2. 11 nondeterministic logspace Turing-machine M with input alphabet Σ such that for every word w ∈ Σ∗ we have: w ∈ K if and only if leaf(M,w) ∈ L. Note that the machine M ... |

22 | A faster grammarbased self-index.
- Gagie, Gawrychowski, et al.
- 2012
(Show Context)
Citation Context ...r p is a factor of suffY prefZ , which is a string of length at most 2p. This is again possible in time O(|p |· |T|) using any linear time pattern matching algorithm for uncompressed strings. It should be also remarked that the currently best algorithm for semi-compressed pattern matching for LZ1 [45] first transforms the LZ1-encoding of the text into an α-balanced SLP using the method from [23] (see Theorem 6) and then runs an algorithm for semi-compressed pattern matching for α-balanced SLPs. Several other algorithmic problems that are related to compressed pattern matching were studied. In [25, 26, 40], self-indexes for SLP-compressed strings are constructed. The goal of such a self-index is to store an SLP A with small space overhead in such a form that for a given uncompressed pattern p one can quickly list all occurrences of p in eval(A). Bille et al. [15] present an efficient algorithm for approximative compressed pattern matching for SLP-compressed strings: 10 Here, for a given pattern p, SLP T, and k ∈ N the goal is to find all occurrences of factors of eval(T) that have distance at most k from p with respect to a certain distance measure (e.g. Hamming distance or edit distance). The ... |

22 |
Collage system: a unifying framework for compressed pattern matching.
- Kida, Matsumoto, et al.
- 2003
(Show Context)
Citation Context ... composition system A in Chomsky normal form with n nonterminals one can compute in time O(n2) an SLP B of size O(n2) such that eval(B) = eval(A). 3 Straight-line programs and dictionary-based compression As already remarked in the Introduction, there is a tight relationship between straight-line programs and dictionary-based compression, in particular, the LZ77 compression scheme. There are various variants of LZ77 compression. The following variant is used for instance in [119]: 1The formalism in [43] differs in some minor details. Moreover, composition systems are called collage systems in [74] and interval grammars in [55]. 5 The LZ77-factorization of a non-empty word s ∈ Σ+ is defined as s = f1f2 · · · fm, where for all 1 ≤ i ≤ m, fi is longest non-empty prefix of fifi+1 · · · fm that is a factor of f1f2 · · · fi−1 in case such a prefix exists, otherwise fi is the first symbol of fifi+1 · · · fm. Then, the LZ77-encoding of w is the sequence c1c2 · · · cm such that for all 1 ≤ i ≤ m, either |fi |= 1 and ci = fi or |fi |≥ 2, fi = (f1f2 · · · fi−1)[p, q] and fi 6= (f1f2 · · · fi−1)[k, k + q − p] for all 1 ≤ k < p. Theorem 5 ([23, 119]) For every SLP A, the number of factors in the LZ... |

22 | Ziv-Ukelson M: Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions
- Mozes, Weimann
- 2007
(Show Context)
Citation Context ...is strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algorithms on compressed strings, we have to make precise the compressed representation we want to use. Such a representation should have two properties: (i) it should cover many compression schemes from practice and (ii) it should be mathematically easy to handle. Straight-line programs (SLPs) have both of these properties. An SLP is a context-free grammar that generates exactly one word. In an SLP, repeated subpatterns in a string have to be represented only once by introducing a non-terminal for the pattern. It i... |

21 |
Model checking a path.
- Markey, Schnoebelen
- 2003
(Show Context)
Citation Context ... ||}-expressions. But the latter problem is known to be NP-complete; in fact this already holds for {·, ||}-expressions [104, 140], which yields statement (2) of Theorem 19. The upper complexity bounds in statement (3) and (4) of Theorem 19 can be shown by rather straightforward algorithms (using for (3) the well-known correspondance between PSPACE and alternating polynomial time). Let us now turn to the lower bounds in (3) and (4) from Theorem 19. PSPACE-hardness for {·,∪,¬} is shown in [86] by a reduction from the quantified boolean satisfiability problem; the reduction is mainly taken from [99], where it was shown that model checking the temporal logic LTL on SLP-compressed strings is PSPACE-complete. PSPACE-hardness for {·,∪, 2} and {·,∪, ∗,∩} is shown by generic reductions from the acceptance problem for a polynomial space bounded machine. Finally, the PSPACE lower bound for {·,∪, ∗, ||} is proved by a reduction from the intersection nonemptiness problem for regular expressions (over the standard operator set {·,∪, ∗}), which is a well-known PSPACE-complete problem [77]. The ATIME(exp(n), O(n))- hardness part from (4) uses a corresponding result from [89] on the model-checking pro... |

20 |
The complexity of tree automata and XPath on grammarcompressed trees
- Lohrey, Maneth
(Show Context)
Citation Context ... ∈ Σ a natural number (the rank of a) is associated. If a tree node v is labelled with a symbol of rank n, then v has exactly n children, which are linearly ordered. Such trees can conveniently be represented as terms. The size |t |of a tree t is the number of nodes of t. Here is an example: Example 35 Let f be a symbol of rank 2, h a symbol of rank 1, and a a symbol of rank 0 (a constant). Then the term h(f(h(f(h(h(a)), a)), h(f(h(h(a)), a)))) corresponds to the node-labelled tree of size 14, shown in Figure 4. A tree straight-line program (TSLP for short and also called SLCF tree grammar in [90, 92]) over the terminal alphabet Σ (which is a ranked alphabet in the above sense) is a tuple A = (N,Σ, S, P ), such that N is a finite set of ranked symbols (the nonterminals), S ∈ N has rank 0 (the initial nonterminal) and P is a finite set of productions of the form A(y1, . . . , yn) → t where A is a nonterminal of rank n and t is a tree built up from the ranked symbols in Σ ∪ N and the parameters y1, . . . , yn which are considered as symbols of rank 0 (i.e., constants). Moreover, as for ordinary SLPs, it is required that every nonterminal occurs on the left-hand side of exactly one production... |

19 | Self-indexed grammar-based compression.
- Claude, Navarro
- 2011
(Show Context)
Citation Context ...r p is a factor of suffY prefZ , which is a string of length at most 2p. This is again possible in time O(|p |· |T|) using any linear time pattern matching algorithm for uncompressed strings. It should be also remarked that the currently best algorithm for semi-compressed pattern matching for LZ1 [45] first transforms the LZ1-encoding of the text into an α-balanced SLP using the method from [23] (see Theorem 6) and then runs an algorithm for semi-compressed pattern matching for α-balanced SLPs. Several other algorithmic problems that are related to compressed pattern matching were studied. In [25, 26, 40], self-indexes for SLP-compressed strings are constructed. The goal of such a self-index is to store an SLP A with small space overhead in such a form that for a given uncompressed pattern p one can quickly list all occurrences of p in eval(A). Bille et al. [15] present an efficient algorithm for approximative compressed pattern matching for SLP-compressed strings: 10 Here, for a given pattern p, SLP T, and k ∈ N the goal is to find all occurrences of factors of eval(T) that have distance at most k from p with respect to a certain distance measure (e.g. Hamming distance or edit distance). The ... |

19 | Word problems and membership problems on compressed words
- Lohrey
(Show Context)
Citation Context ... straightforward to define an SLP B in Chomsky normal form of size 2n such that |eval(B) |≥ 2n. Hence, an SLP can be seen as a compressed representation of the string it generates, and exponential compression rates can be achieved in this way. One may also allow exponential expressions of the form Ai for A ∈ V and i ∈ N in right-hand sides of productions. Here the number i is coded binary. Such an expression can be replaced by a 3 sequence of ordinary productions, where the length of that sequence is bounded polynomially in the length of the binary coding of i. The following construction from [85] is often useful for proving lower complexity bounds, see e.g. the end of Section 5. Example 3 Let w = (w1, . . . , wn) be a tuple of natural numbers. For a bit vector x ∈ {0, 1} n of length n let us define x · w = x1w1 + x2w2 + · · · + xnwn. Let 1k be the constant-1 vector (1, 1, . . . , 1) of length k and let sk = 1k · (w1, . . . , wk) = w1 + · · · + wk for 1 ≤ k ≤ n. Let s = sn = w1 + w2 + · · · + wn. Finally, define the string S(w) = ∏ x∈{0,1}n ax·wbas−x·w. Here, the product ∏ x∈{0,1}n means that we concatenate all string a x·wbas−x·w for x ∈ {0, 1}n and the order of concatenation is the l... |

17 |
Window subsequence problems for compressed texts.
- Cegielski, Guessarian, et al.
- 2006
(Show Context)
Citation Context ...ed subsequence matching that works in time O(|p |· |T|), see [133, 142]. Assume that the SLP T is in Chomsky normal form. Let p = a1 · · · am. The idea is to compute for every position 1 ≤ i ≤ m and every variable X of T the length ℓ(i,X) of the longest prefix of ai · · · am that is a subsequence of evalT(X). The values ℓ(i,X) satisfy a simple recursion, which allows to compute all these values in time O(|p |· |T|). More complex problems related to semi-compressed subsequence matching (e.g. counting the number of shortest factors of eval(T), which contain p as a subsequence) are considered in [22, 134, 142]. The structural complexity of a generalization of semi-compressed subsequence matching was studied by Markey and Schnoebelen [100]. LOGCFL is the class of all languages that are logspace reducible to a context-free language [132]; it coincides with the class of all languages that can be recognized by a uniform boolean circuit family of polynomial size and logarithmic depth, where all ∧-gates have bounded fan-in and all ∨-gates have unbounded fan-in [137]. It is conjectured that LOGCFL is a proper subclass of P. Theorem 16 ([100]) The following problem belongs to LOGCFL: input: Strings p0, p1 ... |

16 | Efficient algorithms for LempelZiv encoding (extended abstract).
- Gasieniec, Karpinski, et al.
- 1996
(Show Context)
Citation Context ...gy can be solved in polynomial time; this is the topic of Section 13. These problems mainly concern curves in 2-dimensional surfaces that are represented by so called normal coordinates. With the help of SLPs, it was shown for instance that Dehn twists and geometric intersection numbers of curves that are given in normal coordinates can be computed in polynomial time [124]. Finally, in Section 15 we briefly discuss algorithmic problems for extensions of SLPs to trees (Section 15.1) and 2-dimensional pictures (Section 15.2). Previous surveys on algorithms for compressed strings can be found in [42, 116, 120]. 2 Preliminaries 2.1 General notations Let Γ be a finite alphabet. The empty word is denoted by ε. Let s = a1a2 · · · an ∈ Γ ∗ be a word over Γ, where ai ∈ Γ for 1 ≤ i ≤ n. The set of symbols occurring in s is alph(s) = {a1, . . . , an}. 2 The length of s is |s |= n. For a ∈ Γ and 1 ≤ i ≤ j ≤ n let |s|a = |{k |ak = a} |be the number of a’s in s. We also write s[i] = ai, and s[i, j] = aiai+1 · · · aj . If i > j we set s[i, j] = ε. Any word s[i, j] is called a factor of s. We assume some background in complexity theory [110]. In particular, the reader should be familiar with the complexity clas... |

15 | Improved grammar-based compressed indexes.
- Claude, Navarro
- 2011
(Show Context)
Citation Context ...r p is a factor of suffY prefZ , which is a string of length at most 2p. This is again possible in time O(|p |· |T|) using any linear time pattern matching algorithm for uncompressed strings. It should be also remarked that the currently best algorithm for semi-compressed pattern matching for LZ1 [45] first transforms the LZ1-encoding of the text into an α-balanced SLP using the method from [23] (see Theorem 6) and then runs an algorithm for semi-compressed pattern matching for α-balanced SLPs. Several other algorithmic problems that are related to compressed pattern matching were studied. In [25, 26, 40], self-indexes for SLP-compressed strings are constructed. The goal of such a self-index is to store an SLP A with small space overhead in such a form that for a given uncompressed pattern p one can quickly list all occurrences of p in eval(A). Bille et al. [15] present an efficient algorithm for approximative compressed pattern matching for SLP-compressed strings: 10 Here, for a given pattern p, SLP T, and k ∈ N the goal is to find all occurrences of factors of eval(T) that have distance at most k from p with respect to a certain distance measure (e.g. Hamming distance or edit distance). The ... |

15 | Randomized efficient algorithms for compressed strings: The finger-print approach (extended abstract
- Gasieniec, Karpinski, et al.
- 1996
(Show Context)
Citation Context ...g edges for a node are linearly ordered, and • there is a unique node of outdegree 0 (the output node of D). Such a circuit computes an element of A. An SLP over the terminal alphabet Γ is nothing else than a circuit over the free monoid Γ∗, where the only operation is concatenation and the constants are the symbols from the alphabet Γ. It should be noted that in the literature, the term “straight-line program” often refers to circuits over the ring of integers or, more generally, polynomial rings, see e.g. [3, 63, 70]. For some applications, an extension of SLPs called composition systems in [43] are useful.1 A composition system A is defined as an SLP but may also contain productions of the form X → Y [i, j] for i, j ∈ N with 1 ≤ i ≤ j. Assume that the string w = evalA(Y ) is already defined. Then we let evalA(X) = w[i,max{|w|, j}]. A composition system is in Chomsky normal form if all productions have the form X → a, X → Y Z or X → Y [i, j] for nonterminals X, Y , and Z and a terminal symbol a. The following result was shown by Hagenah in his PhD thesis [55] (in German), see also [125]. Theorem 4 From a given composition system A in Chomsky normal form with n nonterminals one can co... |

15 | Satisfiability of word equations with constants is in exponential space
- Gutiérrez
- 1998
(Show Context)
Citation Context ...and his goal was to prove the undecidability of Hilbert’s 10th problem by proving undecidability of solvability of word equations. Whereas Hilbert’s 10th problem was finally shown to be undecidable by the seminal work of Matiyasevich from 1971, solvability of word equations was shown to be decidable by Makanin [98] in 1977. Makanin’s algorithm is very complicated, both with respect to its computational complexity and its technical difficulty. Over the years, the estimate on the time and space complexity of Makanin’s algorithm was gradually improved. Currently, the best known bound is EXPSPACE [54]. In 1999, Plandowski came up with an alternative algorithm for checking solvability of word equations, which put the problem into PSPACE [113]. This is still the best upper complexity bound. The best known lower bound is NP, and it was repeatedly conjectured that the precise complexity is NP too. Now, consider the situation, where for every variable x ∈ X a length-constraint ℓ(x) ∈ N is specified in binary representation. Consider the question whether a given word equation u = v has a solution that respects the length constraints. Clearly, this problem is decidable. In fact, the naive algorit... |

14 |
Makanin’s Algorithm
- Diekert
- 2001
(Show Context)
Citation Context ...) = a for a ∈ Σ and σ(x) = σ(x). A solution of the word equation (u, v) is a mapping σ : X → Σ∗ such that σ(u) = σ(v). In the following, we write a word equation (u, v) as u = v. A system of word equations over (Σ,X ) is a conjunction S = ∧n i=1 ui = vi of word equations over (Σ,X ). If σ is a solution of every word equation ui = vi, then σ is called a solution of S. Standard arguments show that from a given system of word equations S one can compute (in logspace) a single word equation u = v such that the set of solutions of u = v is equal to the set of solutions of the system S, see e.g. [33]. Deciding whether a given word equation has a solution turned out to be very difficult. In the 1950’s, Markov conjectured that it is undecidable whether a given word equation has a solution. He knew that solvability of word equations can be reduced to Hilbert’s 10th problem, and his goal was to prove the undecidability of Hilbert’s 10th problem by proving undecidability of solvability of word equations. Whereas Hilbert’s 10th problem was finally shown to be undecidable by the seminal work of Matiyasevich from 1971, solvability of word equations was shown to be decidable by Makanin [98] in 197... |

14 |
Coxeter groups are virtually special
- Haglund, Wise
(Show Context)
Citation Context ...-angled Artin groups received a lot of attention in group theory during the last few years [14, 30, 46]. With (1) and (5) it follows that for every group G that virtually embeds into a right-angled Artin group (i.e., G has a finite index subgroup that embeds into a right-angled Artin group) the compressed word problem can be solved in polynomial time. Groups that virtually embed into a right-angled Artin group are also known as virtually special. Recently, it has been shown that virtually special groups form a quite large class of groups. For instance, every Coxeter group is virtually special [56] and every fully residually free group is virtually special [141].6 It follows that for every Coxeter group as well as for every fully residually free group the compressed word problem can be solved in polynomial time. For fully residually free groups, this result was directly shown by Macdonald [97] using a generalization of the technique for free groups. For finitely generated linear groups, the word problem can be solved in logarithmic space [84, 130]. We currently do not know whether also for these groups the compressed word problem can be solved in polynomial time. The best we can achieve... |

14 | Querying and embedding compressed texts.
- Lifshits, Lohrey
- 2006
(Show Context)
Citation Context ...y solved in polynomial time: 4 (1) Given an SLP A (w.l.o.g. in Chomsky normal form), calculate |eval(A)|: One introduces for each nonterminal A of the SLP A a variable nA which takes values in N. For a production A → a of A we set nA = 1, and for a production A → BC of A we set nA = nB + nC . In this way, the length |eval(A) |can be computed with |A |many additions. Moreover, the number of bits of each value nA is bounded by O(log(|evalA(A)|)) ≤ O(|A|). (2) Given an SLP A (w.l.o.g. in Chomsky normal form) and a number 1 ≤ i ≤ |eval(A)|, calculate eval(A)[i] (this problem is in fact P-complete [82], see Section 10): Basically, the algorithm walks down in the derivation tree for A to position i. At each stage, the algorithm stores a number p and a nonterminal A of A such that 1 ≤ p ≤ |evalA(A)|. Initially, p = i, and A is the initial nonterminal of A. If the unique production for A is A → BC, then there are two cases: If 1 ≤ p ≤ |evalA(B)|, then we continue with position p and the nonterminal B. Otherwise, we have |evalA(B) |+ 1 ≤ p ≤ |evalA(A)|, and we continue with position p − |evalA(B) |(this number can be computed using the algorithm from the previous point) and the nonterminal C. F... |

12 | Context matching for compressed terms
- Gascón, Godoy, et al.
- 2008
(Show Context)
Citation Context ...(A) [90]. • It is P-complete to check for a given linear TSLP A and a given tree automaton A, whether A accepts eval(A) [92]. The polynomial time algorithm in the linear case works in two steps: (1) Transform in polynomial time the input TSLP A into a TSLP B such that eval(A) = eval(B) and every nonterminal of B has rank at most one. This is the difficult step in [92]. (2) Evaluate the input tree automaton A on eval(B) in polynomial time, using the fact that every nonterminal of B has rank at most one, see [90]. Several other algorithmic problems for TSLP-compressed input trees are studied in [20, 41, 80, 127, 128]. 15.2 Pictures A picture over the alphabet Σ is a mapping p : {1, . . . , n} × {1, . . . ,m} → Σ for some n,m ≥ 1. We say that (n,m) is the dimension of the picture p. For more information on pictures see [47]. 29 A picture can be viewed as a 2-dimensional word. We define two partial concatenation operations for pictures. Given two pictures p1 and p2, where p1 has dimension (n,m) and p2 has dimension (n, k), we define the horizontal concatenation (p1, p2) : {1, . . . , n}×{1, . . . ,m+ k} → Σ (a picture of dimension (n,m + k)) as follows, where x ∈ {1, . . . , n} and y ∈ {1, . . . ,m + k}: (p... |

11 | Pattern matching in dynamic texts.
- Alstrup, Brodal, et al.
- 2000
(Show Context)
Citation Context ...signatures. A signature corresponds to 7 a nonterminal of an SLP. The signature of a string is computed by iteratively breaking up the sequence into small blocks, which are encoded by integers using a pairing function. Adding the concatenation xy of two string x and y to the data structure needs time O(log n(log m log∗ m + log n)) for the mth operation, where n is the length of the resulting string (hence, log(n) ≤ m). This leads to a cubic time algorithm for checking equality of SLP-compressed strings. An improvement of the data structure from [105] was obtained by Alstrup, Brodal, and Rauhe [5]; see [4] for a long version. There, the complexity of adding the concatenation xy of two string x and y to the data structure is improved to O(log n(log∗ m + log |Σ|)), where n and m have the same meaning as above and Σ is the underlying alphabet. This leads to the complexity of O(n2 log∗ n) (where n = |A |+ |B|) for checking equality of SLP-compressed strings, which is currently the best upper bound. One can view the algorithms from [5, 105] as efficient methods for transforming an SLP A into a canonical SLP A such that eval(A) = eval(B) if and only if A′ and B′ are identical. One should not... |

11 |
Input-driven languages are in log n depth.
- Dymond
- 1988
(Show Context)
Citation Context ...) = L(V ′), the class of visibly pushdown languages is effectively closed under boolean operations, and equivalence, inclusion, and intersection non-emptiness are decidable for VPAs, see [6] where also precise complexity results are shown. Moreover, also the membership problem seems to be easier for visibly pushdown languages than for general context-free languages. By a classical result from [51], there exists a context-free language with a LOGCFL-complete membership problem. For visibly pushdown languages the complexity of the membership problem decreases to the circuit complexity class NC1 [35] and is therefore of the same complexity as for regular languages [9] (complexity theorists conjecture that NC1 is a proper subclass of LOGCFL). In contrast to this, by the following theorem, compressed membership is in general PSPACE-complete even for linear visibly pushdown languages, whereas it is P-complete for regular languages (Theorem 18): Theorem 23 ([88]) There is a fixed 1-turn visibly pushdown language with a PSPACE-complete compressed membership problem. This result can be deduced from Theorem 14. By the main result of [60] there exists a regular language K ⊆ {0, 1}∗ such that for ... |

11 | Faster fully compressed pattern matching by recompression.
- Jez
- 2012
(Show Context)
Citation Context ... There are dozens of linear time algorithms for this problem on uncompressed strings, most notably the well-known Knuth-Morris-Pratt algorithm [75]. It is therefore natural to ask, whether a polynomial time algorithm for pattern matching on SLP-compressed strings exists; this problem is sometimes called fully compressed pattern matching and is defined as follows: input: Two SLPs P and T question: Is eval(P) a factor of eval(T)? The first polynomial time algorithm for fully compressed pattern matching was presented in [73], further improvements with respect to the running time were achieved in [42, 67, 81, 106]. Theorem 12 ([42, 67, 73, 81, 106]) Fully compressed pattern matching can be solved in polynomial time. Basically, the algorithms from [42, 67, 73, 81, 106] can be divided into two classes: Those that are based on the periodicity lemma of Fine and Wilf [39] mentioned in the previous section [42, 73, 81, 106] and Jez’s solution from [67] that does not use any non-trivial combinatorial properties of words. An important observation that is crucial in [42, 67, 73, 81, 106] is the obvious fact that if a pattern p is a factor of eval(T) then there exists a production X → Y Z in eval(T) such that p ... |

10 | Tree structure compression with repair.
- Lohrey, Maneth, et al.
- 2011
(Show Context)
Citation Context ...a) Ai(y1) → Ai+1(Ai+1(y1)) for 0 ≤ i < n An(y1) → f(y1, y1) Then eval(An) is a complete binary tree of height 2 n. Thus, |eval(An) |= 2 · 2 2n − 1. Example 37 motivates the definition of linear TSLPs. A TSLP A = (N,S, P ) is linear if for every rule (A(y1, . . . , yn) → t) ∈ P , each of the parameters y1, . . . , yn occurs exactly once in the tree t. The TSLP from Example 36 is linear. In contrast to non-linear TSLPs, for a linear TSLP A the size of tree eval(A) is bounded by 2O(|A|). Efficient algorithms that generate for a given input tree t a linear TSLP A with eval(A) = t are described in [18, 91] and [2] (for a slightly different type of grammars). A TSLP is nothing else than a context-free tree grammar that generates a single tree. An extension of TSLPs to higher order tree grammars was recently proposed in [76]. Let us now consider algorithmic problems for TSLP-compressed trees. Concerning equality checking, we have the following: 28 Theorem 38 ([18, 126]) For two given TSLPs A and B it can be checked in PSPACE, whether eval(A) = eval(B). For two given linear TSLPs A and B it can be checked in P, whether eval(A) = eval(B). For the PSPACE-bound in the first statement, one has to noti... |

10 | Efficient computation in groups via compression.
- Lohrey, Schleimer
- 2007
(Show Context)
Citation Context ...t makes sense to design algorithms that directly operate on the compressed string representation in order to save the time and space for (de)compression. Such a scenario can be found for instance in large genom databases or XML processing. • Large and often highly compressible strings may appear as intermediate data structures in algorithms. Here, one may try to store a compressed representation of these intermediate data structures and to process this representation. This may lead to more efficient algorithms. Examples for this strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algori... |

9 | Finite monoids: From word to circuit evaluation.
- Beaudry, McKenzie, et al.
- 1997
(Show Context)
Citation Context ...age L ⊆ {0, 1}∗, the compressed membership problem for the language L is complete for the logspace leaf language class LEAFlog(L). If we fix some formalism for describing languages, we may also consider a uniform version of the compressed membership problem, where the language L is part of the input. In this section, we will survey result on the complexity of this problem for various kinds of automata, regular expression, and grammars. Let us start with regular languages. 9.1 Regular languages Let us start with regular languages, the most basic languages in formal language theory. Theorem 18 ([10, 100, 115]) The following holds: • For a given nondeterministic finite automaton A with n states and an SLP A it can be checked in time O(|A |· n3) whether eval(A) ∈ L(A). • There exists a fixed regular language with a P-complete compressed membership problem. The first statement is easy to show. Let A be an SLP in Chomsky normal form over the alphabet Σ and let A be a nondeterministic finite automaton with state set Q and input alphabet Σ. W.l.o.g. assume that Q = {1, . . . , n}. The set of transition tuples of A can be represented by a bunch of boolean matrices Ba ∈ {0, 1} n×n, one for each input symb... |

9 | Fast q-gram mining on SLP compressed strings.
- Goto, Bannai, et al.
- 2011
(Show Context)
Citation Context ...f-index is to store an SLP A with small space overhead in such a form that for a given uncompressed pattern p one can quickly list all occurrences of p in eval(A). Bille et al. [15] present an efficient algorithm for approximative compressed pattern matching for SLP-compressed strings: 10 Here, for a given pattern p, SLP T, and k ∈ N the goal is to find all occurrences of factors of eval(T) that have distance at most k from p with respect to a certain distance measure (e.g. Hamming distance or edit distance). The problem of computing q-gram frequencies for SLP-compressed strings is studied in [49, 50]. A q-gram is just a string of length q. An algorithm that computes in time O(q · n) for a Chomsky normal form SLP A of size n a list with the frequencies of all q-grams occurring in eval(A) is presented in [49]. This algorithm can be seen as a refinement of the algorithm for semi-compressed pattern matching outlined above. In [103], it is shown that the length of the longest common factor of two SLP-compressed strings can be computed in polynomial time. 7 Leaf languages and string compression In this section, we discuss a technique that turned out to be useful for deriving lower complexity bo... |

9 |
Gleichungen mit regulären Randbedingungen über freien Gruppen
- Hagenah
- 2000
(Show Context)
Citation Context ...e generally, polynomial rings, see e.g. [3, 63, 70]. For some applications, an extension of SLPs called composition systems in [43] are useful.1 A composition system A is defined as an SLP but may also contain productions of the form X → Y [i, j] for i, j ∈ N with 1 ≤ i ≤ j. Assume that the string w = evalA(Y ) is already defined. Then we let evalA(X) = w[i,max{|w|, j}]. A composition system is in Chomsky normal form if all productions have the form X → a, X → Y Z or X → Y [i, j] for nonterminals X, Y , and Z and a terminal symbol a. The following result was shown by Hagenah in his PhD thesis [55] (in German), see also [125]. Theorem 4 From a given composition system A in Chomsky normal form with n nonterminals one can compute in time O(n2) an SLP B of size O(n2) such that eval(B) = eval(A). 3 Straight-line programs and dictionary-based compression As already remarked in the Introduction, there is a tight relationship between straight-line programs and dictionary-based compression, in particular, the LZ77 compression scheme. There are various variants of LZ77 compression. The following variant is used for instance in [119]: 1The formalism in [43] differs in some minor details. Moreover... |

9 | Faster algorithm for bisimulation equivalence of normed context-free processes
- Lasota, Rytter
(Show Context)
Citation Context ...ime and space for (de)compression. Such a scenario can be found for instance in large genom databases or XML processing. • Large and often highly compressible strings may appear as intermediate data structures in algorithms. Here, one may try to store a compressed representation of these intermediate data structures and to process this representation. This may lead to more efficient algorithms. Examples for this strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algorithms on compressed strings, we have to make precise the compressed representation we want to use. Such a r... |

8 | Classifying polynomials and identity testing.
- Agrawal, Saptharishi
- 2009
(Show Context)
Citation Context ... and assume that B already contains enough productions such that evalB(Y ′) (resp., evalB(Z ′)) is the reduced normal form of evalA(Y ) (resp., evalA(Z)). Let evalB(Y ′) = a1a2 · · · am and evalB(Z ′) = b1b2 · · · bk. In the concatenation evalB(Y ′)evalB(Z ′) cancellation may only occur at the border between evalB(Y ′) and evalB(Z ′). We have to compute the length n of the longest suffix of evalB(Y ′) that cancels against the length-n prefix of evalB(Z ′), i.e., we compute the maximal number n such that (evalB(Y ′)[m − n + 1,m])−1 = a−1m a −1 m−1 · · · a −1 m−n+1 = b1b2 · · · bn = (evalB(Z ′))[1, n] (1) For a particular n we can check equation (1) in polynomial time by the extension of Theorem 9 to composition systems (which holds by Theorem 4). Now, the maximal n satisfying (1) can be computed in polynomial time as well using binary search. Finally, we add to the composition system B the production X ′ → Y ′[1,m − n]Z ′[n + 1, k]. A direct corollary of Theorem 26 and 28 is the following result, which solves an open problem from [71]. Corollary 29 ([125]) Let F be a finitely generated free group. The word problem for Aut(F ) (the automorphism group of F , which is finitely generated) can... |

8 | Optimal pattern matching in LZW compressed strings.
- Gawrychowski
- 2011
(Show Context)
Citation Context ...he pattern and the text are compressed. In practical applications, the pattern is usually short in comparison to the text. In such a situation, it makes sense the consider the weaker variant, where only the text is compressed, but the pattern is given explicitly. This leads to the semi-compressed pattern matching problem (sometimes just called compressed pattern matching problem), where it is asked whether an uncompressed pattern p is a factor of a compressed text eval(T). Semicompressed pattern matching has been studied intensively for various compression schemes, e.g. LZW (Lempel–Ziv-Welch) [7, 44] and LZ1 (a variant of LZ77 as defined in Section 3) [38, 45]. For SLP-compressed texts it is easy to show that compressed pattern matching can be solved in time O(|p|·|T|). Assume that the text SLP T is in Chomsky normal form. As remarked earlier, it suffices to check for each nonterminal X of T, whether there exists an occurrence of p in evalT(X) that touches the cut of X. For this, one computes for every nonterminal X the prefix prefX of evalT(X) of length min{|p|, |evalT(X)|} as well as the suffix suffX of evalT(X) of length min{|p|, |evalT(X)|}. This is possible in time O(|p |· |T|) by a ... |

7 |
A bisection algorithm for grammar-based compression of ordered trees.
- Akutsu
- 2010
(Show Context)
Citation Context ...i+1(Ai+1(y1)) for 0 ≤ i < n An(y1) → f(y1, y1) Then eval(An) is a complete binary tree of height 2 n. Thus, |eval(An) |= 2 · 2 2n − 1. Example 37 motivates the definition of linear TSLPs. A TSLP A = (N,S, P ) is linear if for every rule (A(y1, . . . , yn) → t) ∈ P , each of the parameters y1, . . . , yn occurs exactly once in the tree t. The TSLP from Example 36 is linear. In contrast to non-linear TSLPs, for a linear TSLP A the size of tree eval(A) is bounded by 2O(|A|). Efficient algorithms that generate for a given input tree t a linear TSLP A with eval(A) = t are described in [18, 91] and [2] (for a slightly different type of grammars). A TSLP is nothing else than a context-free tree grammar that generates a single tree. An extension of TSLPs to higher order tree grammars was recently proposed in [76]. Let us now consider algorithmic problems for TSLP-compressed trees. Concerning equality checking, we have the following: 28 Theorem 38 ([18, 126]) For two given TSLPs A and B it can be checked in PSPACE, whether eval(A) = eval(B). For two given linear TSLPs A and B it can be checked in P, whether eval(A) = eval(B). For the PSPACE-bound in the first statement, one has to notice that ... |

7 | Pattern matching in Lempel-Ziv compressed strings: Fast, simple, and deterministic.
- Gawrychowski
- 2011
(Show Context)
Citation Context ...tions, the pattern is usually short in comparison to the text. In such a situation, it makes sense the consider the weaker variant, where only the text is compressed, but the pattern is given explicitly. This leads to the semi-compressed pattern matching problem (sometimes just called compressed pattern matching problem), where it is asked whether an uncompressed pattern p is a factor of a compressed text eval(T). Semicompressed pattern matching has been studied intensively for various compression schemes, e.g. LZW (Lempel–Ziv-Welch) [7, 44] and LZ1 (a variant of LZ77 as defined in Section 3) [38, 45]. For SLP-compressed texts it is easy to show that compressed pattern matching can be solved in time O(|p|·|T|). Assume that the text SLP T is in Chomsky normal form. As remarked earlier, it suffices to check for each nonterminal X of T, whether there exists an occurrence of p in evalT(X) that touches the cut of X. For this, one computes for every nonterminal X the prefix prefX of evalT(X) of length min{|p|, |evalT(X)|} as well as the suffix suffX of evalT(X) of length min{|p|, |evalT(X)|}. This is possible in time O(|p |· |T|) by a straightforward bottom-up computation. Then, one only has to ... |

6 |
Literal shuffle of compressed words.
- Bertoni, Choffrut, et al.
- 2008
(Show Context)
Citation Context ...tion than the standard n3 school method, the time bound O(|A |· n3) can be improved. For about 20 years, the Coppersmith–Winograd algorithm was the asymptotically best algorithm for matrix multiplication with a complexity of O(n2,3755). This bound was recently improved by Williams [29] to O(n2,3727). For a given deterministic finite automaton A with n states and an SLP A, one can verify eval(A) ∈ L(A) in time O(|A |· n) [115]. Instead of |A |many boolean matrix multiplications, only |A |compositions of functions from {1, . . . , n} to {1, . . . , n} are necessary in the deterministic case. In [12] the following generalization has been shown: From a given SLP A over an alphabet Σ and a deterministic finite state transducer τ with input alphabet Σ and n states, one can compute in time O(|A |· n) an SLP B of size O(|A |· n) for the output string τ(eval(A)). The second statement of Theorem 18 is stated in [100]. The proof is based on a result from [10], which states that there exists a fixed finite monoid M (in fact, any finite non-solvable group can be taken for M) such that the circuit evaluation problem for M is P-complete. The latter problem asks whether a given circuit over M (i.e., a... |

6 |
ε-productions in context-free grammars.
- Goldschlager
- 1981
(Show Context)
Citation Context ...this section, we consider the compressed membership problem for context-free languages. It is easy to see that for every context-free language the compressed membership problem can be solved in polynomial space. This result even holds in a uniform setting, where the input consists of a context-free grammar G and an SLP A, and it is asked whether eval(A) ∈ L(G): Theorem 22 ([85]) For a given context-free grammar G and an SLP A it can be checked in polynomial space whether eval(A) ∈ L(G). First, one has to transform the grammar G into Chomsky normal form; this is possible in polynomial time. By [48] the uniform (uncompressed) membership problem for context-free grammars in Chomsky normal form can be solved in DSPACE(log2(|G|+ |w|)), where |G |is the size of the input grammar (in Chomsky normal form) and w is the word that has to be tested for membership. This algorithm can be used to check for a given context-free grammar G in Chomsky normal form and an SLP A in DSPACE(log2(|G |+ 2|A|)), i.e., in polynomial space, whether eval(A) ∈ L(G). In [85], a fixed deterministic context-free language with a PSPACE-complete compressed membership problem was constructed using a generic reduction from... |

6 |
Functional programs as compressed data.
- Kobayashi, Matsuda, et al.
- 2012
(Show Context)
Citation Context ...linear if for every rule (A(y1, . . . , yn) → t) ∈ P , each of the parameters y1, . . . , yn occurs exactly once in the tree t. The TSLP from Example 36 is linear. In contrast to non-linear TSLPs, for a linear TSLP A the size of tree eval(A) is bounded by 2O(|A|). Efficient algorithms that generate for a given input tree t a linear TSLP A with eval(A) = t are described in [18, 91] and [2] (for a slightly different type of grammars). A TSLP is nothing else than a context-free tree grammar that generates a single tree. An extension of TSLPs to higher order tree grammars was recently proposed in [76]. Let us now consider algorithmic problems for TSLP-compressed trees. Concerning equality checking, we have the following: 28 Theorem 38 ([18, 126]) For two given TSLPs A and B it can be checked in PSPACE, whether eval(A) = eval(B). For two given linear TSLPs A and B it can be checked in P, whether eval(A) = eval(B). For the PSPACE-bound in the first statement, one has to notice that a TSLP A can also be viewed as a graph generating grammar that produces a node-labelled directed acyclic graph dag(A), whose unfolding is eval(A). This graph grammar is obtained by merging in a right-hand side t(y... |

6 | The complexity of monadic second-order unification.
- Levy, Schmidt-Schauß, et al.
- 2008
(Show Context)
Citation Context ...(A) [90]. • It is P-complete to check for a given linear TSLP A and a given tree automaton A, whether A accepts eval(A) [92]. The polynomial time algorithm in the linear case works in two steps: (1) Transform in polynomial time the input TSLP A into a TSLP B such that eval(A) = eval(B) and every nonterminal of B has rank at most one. This is the difficult step in [92]. (2) Evaluate the input tree automaton A on eval(B) in polynomial time, using the fact that every nonterminal of B has rank at most one, see [90]. Several other algorithmic problems for TSLP-compressed input trees are studied in [20, 41, 80, 127, 128]. 15.2 Pictures A picture over the alphabet Σ is a mapping p : {1, . . . , n} × {1, . . . ,m} → Σ for some n,m ≥ 1. We say that (n,m) is the dimension of the picture p. For more information on pictures see [47]. 29 A picture can be viewed as a 2-dimensional word. We define two partial concatenation operations for pictures. Given two pictures p1 and p2, where p1 has dimension (n,m) and p2 has dimension (n, k), we define the horizontal concatenation (p1, p2) : {1, . . . , n}×{1, . . . ,m+ k} → Σ (a picture of dimension (n,m + k)) as follows, where x ∈ {1, . . . , n} and y ∈ {1, . . . ,m + k}: (p... |

6 | Compressed membership problems for regular expressions and hierarchical automata.
- Lohrey
- 2010
(Show Context)
Citation Context ...egular expressions. It characterizes the complexity for all operator sets {·,∪} ⊆ C ⊆ {·,∪, ∗,∩,¬, ||, 2}. The class ATIME(exp(n), O(n)) denotes the class of all problems that can be solved on an alternating Turing-machine in exponential time, but the number of alternations (i.e., transitions between an existential and a universal state or vice versa) has to be bounded by O(n) on every computation path. Completeness results for ATIME(exp(n), O(n)) are typical for logical theories [28], but we are not aware of any other completeness results for this class in formal language theory. Theorem 19 ([86]) Let {·,∪} ⊆ C ⊆ {·,∪, ∗,∩,¬, ||, 2}. The compressed membership problem for C is (1) in P if C ⊆ {·,∪,∩} or C ⊆ {·,∪, ∗}, (2) NP-complete if {·,∪, ||} ⊆ C ⊆ {·,∪,∩, ||}, (3) PSPACE-complete if (i) ¬ 6∈ C or ||6∈ C and (ii) C contains {·,∪,¬} or {·,∪, 2} or {·,∪, ∗,∩} or {·,∪, ∗, ||}, and (4) ATIME(exp(n), O(n))-complete if {·,∪,¬, ||} ⊆ C. 16 The first statement (membership in P for C ⊆ {·,∪,∩} or C ⊆ {·,∪, ∗}) is clear: The case C ⊆ {·,∪, ∗} reduces to the uniform compressed membership problem for nondeterministic finite automata (since {·,∪, ∗}-expressions can be transformed in polynomial t... |

6 | Leaf languages and string compression.
- Lohrey
- 2011
(Show Context)
Citation Context ... distribution, then PP can be defined as the class of all languages L for which there exists a probabilistic machine M such that for all inputs x: x ∈ L if and only if M accepts x with probability > 1/2. 11 nondeterministic logspace Turing-machine M with input alphabet Σ such that for every word w ∈ Σ∗ we have: w ∈ K if and only if leaf(M,w) ∈ L. Note that the machine M together with an input w can be seen as a compressed representation of the string leaf(M,w). In a certain sense, if the machine M is logspace-bounded, then this form of compression is equivalent to SLP-compression: Theorem 13 ([88]) The following holds: • Let M be an adequate nondeterministic logspace Turing-machine. From a given input w ∈ Σ∗ for M we can construct in logspace an SLP A over {0, 1} such that eval(A) = leaf(M,w). • There exists an adequate nondeterministic logspace Turing-machine M that takes as input an SLP A in Chomsky normal form over the alphabet {0, 1} and produces the leaf string leaf(M, A) = eval(A). The proof of this result is quite simple. For the first statement, let n be the length of the input w and assume that M works in space log(n) on an input of length n. Then the nonterminals of the SLP A... |

6 | Model-checking hierarchical structures.
- Lohrey
- 2012
(Show Context)
Citation Context ...tion is mainly taken from [99], where it was shown that model checking the temporal logic LTL on SLP-compressed strings is PSPACE-complete. PSPACE-hardness for {·,∪, 2} and {·,∪, ∗,∩} is shown by generic reductions from the acceptance problem for a polynomial space bounded machine. Finally, the PSPACE lower bound for {·,∪, ∗, ||} is proved by a reduction from the intersection nonemptiness problem for regular expressions (over the standard operator set {·,∪, ∗}), which is a well-known PSPACE-complete problem [77]. The ATIME(exp(n), O(n))- hardness part from (4) uses a corresponding result from [89] on the model-checking problem for monadic second-order logic over SLP-compressed words over a unary alphabet. 9.3 Compressed finite automata A compressed nondeterministic finite automaton (CNFA for short) is a tuple A = (Q,Σ, δ, q0, F ), where Q is a finite set of states, Σ is a finite alphabet, q0 ∈ Q is the initial state, F ⊆ Q is the set of final states, and δ is a finite set of transitions of the form (p, A, q), where p and q are states and A is an SLP over Σ. The size of A is |A |= |Q|+ ∑ (p,A,q)∈δ |A|. We say that a word w labels a path from state p to state q in A if there exists a seq... |

5 |
Uber die Toplogie des dreidimensionalen Raumes.
- Dehn
- 1910
(Show Context)
Citation Context ...nerating set for G. This means that every element of G can be written as a product of elements from Σ∪Σ−1. The word problem of G with respect to Σ is the following decision problem: input: A word w ∈ (Σ ∪ Σ−1)∗ question: Does w = 1 hold in the group G? 20 It is well known and easy to see that if Γ is another finite generating set for G, then the word problem for G with respect to Σ is logspace many-one reducible to the word problem for G with respect to Γ. This justifies to speak just of the word problem for the group G. The word problem was introduced in the pioneering work of Dehn from 1910 [32] in relation with topological questions. Novikov [109] and independently Boone [16] constructed examples of finitely presented groups (i.e., finitely generated groups with only finitely many defining relations) with an undecidable word problem. Despite this negative result, many natural classes of groups with decidable word problems were found. Prominent examples are for instance finitely generated linear groups, automatic groups [36], and one-relator groups. With the advent of computational complexity theory, also the complexity of word problems became an active research area. For instance, i... |

5 | Recompression: a simple and powerful technique for word equations
- Jeż
- 2013
(Show Context)
Citation Context ...ution for a word equation u = v. We say that σ is minimal, if for every solution σ′ of u = v we have |σ(u) |≤ |σ′(u)|. We say that |σ(u) |is the length of a minimal solution of u = v. By the following result of Plandowski and Rytter [114] , minimal solutions for word equations are highly compressible: Theorem 33 ([114]) Let u = v be a word equation over (Σ,X ) and let n = |uv|. Assume that u = v has a solution and let N be the length of a minimal solution of u = v. Then, for every minimal solution σ of u = v, the word σ(u) can be generated by an SLP of size O(n2 log2(N)(log n+ log log N)). In [68], Jez applied his recompression technique from [67, 66] (see Section 6) to word equations and obtained an alternative PSPACE-algorithm for solving word equations. Moreover, his technique yields a O(poly(n) log N) bound in Theorem 33 (instead of the O(n2 log2(N)(log n + log log N)) bound); this result is not explicitly state in [68]. 13 Computational topology In [37, 122, 124] efficient algorithms for SLP-compressed strings are used to solve various problems in computational topology efficiently. Fix a connected, compact, and orientable surface M , possibly with a boundary ∂M , see for instance... |

5 |
The iterated mod problem.
- Karloff, Ruzzo
- 1989
(Show Context)
Citation Context ...ing carries a certain symbol. Let us formally define the compressed querying problem as follows: input: An SLP A over an alphabet Σ, a binary-coded number 1 ≤ i ≤ |eval(A)|, and a symbol a ∈ Σ. question: Does eval(A)[i] = a hold? 4The proof of this result uses again Barrington’s technique mentioned at the end of Section 9.1. 19 Theorem 24 ([82]) Compressed querying is P-complete. We already argued in Section 2.2 that compressed querying can be solved in polynomial time. P-completeness of compressed querying is shown in [82] by a reduction from the P-complete super increasing subsetsum problem [72], which is the following problem: input: Binary coded natural numbers t, w1, . . . , wn such that wk < ∑k−1 i=1 wi for all 1 ≤ k ≤ n. question: Do there exist b1, . . . , bn ∈ {0, 1} such that t = ∑n i=1 biwi? Assume that w1, . . . , wn are binary coded natural numbers such that wk < ∑k−1 i=1 wi for all 1 ≤ k ≤ n. In [82], it is shown how to construct in logspace an SLP A over the alphabet {a, b} such that |eval(A) |= 1 + ∑n i=1 wi and for all 0 ≤ m ≤ ∑n i=1 wi the following holds: eval(A)[m + 1] = b if and only if there exist b1, . . . , bn ∈ {0, 1} such that m = ∑n i=1 biwi. The construction... |

5 | Compressed membership in automata with compressed labels.
- Lohrey, Mathissen
- 2011
(Show Context)
Citation Context ...his recompression technique that he latter applied in [67] to get his O((|T|+ |P|) log(M) log(|T|+ |P|))-algorithm for fully compressed pattern matching, see Section 6. The order of a word u is the largest number n such that u can be written as u = vnw, where w is a prefix of v. If A1, . . . , An is a list of all SLPs that occur as labels in the CNFA A, then we set ord(A) = max{ord(eval(Ai)) |1 ≤ i ≤ n}. Note that ord(A) is in general exponential in the size of A. By the following result, the compressed membership problem for CNFAs A with small ord(A) can be solved efficiently. 17 Theorem 21 ([93]) Given a CNFA A and an SLP A, we can check eval(A) ∈ L(A) in time polynomial in |A|, |A|, and ord(A). 9.4 Context-free languages In this section, we consider the compressed membership problem for context-free languages. It is easy to see that for every context-free language the compressed membership problem can be solved in polynomial space. This result even holds in a uniform setting, where the input consists of a context-free grammar G and an SLP A, and it is asked whether eval(A) ∈ L(G): Theorem 22 ([85]) For a given context-free grammar G and an SLP A it can be checked in polynomial space... |

5 | Compressed words and automorphisms in fully residually free groups.
- Macdonald
- 2010
(Show Context)
Citation Context ...t makes sense to design algorithms that directly operate on the compressed string representation in order to save the time and space for (de)compression. Such a scenario can be found for instance in large genom databases or XML processing. • Large and often highly compressible strings may appear as intermediate data structures in algorithms. Here, one may try to store a compressed representation of these intermediate data structures and to process this representation. This may lead to more efficient algorithms. Examples for this strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algori... |

4 | Tracing compressed curves in triangulated surfaces.
- Erickson, Nayyeri
- 2012
(Show Context)
Citation Context ...ly operate on the compressed string representation in order to save the time and space for (de)compression. Such a scenario can be found for instance in large genom databases or XML processing. • Large and often highly compressible strings may appear as intermediate data structures in algorithms. Here, one may try to store a compressed representation of these intermediate data structures and to process this representation. This may lead to more efficient algorithms. Examples for this strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algorithms on compressed strings, we have to ... |

4 |
Parameter reduction and automata evaluation for grammar-compressed trees.
- Lohrey, Maneth, et al.
- 2012
(Show Context)
Citation Context ... ∈ Σ a natural number (the rank of a) is associated. If a tree node v is labelled with a symbol of rank n, then v has exactly n children, which are linearly ordered. Such trees can conveniently be represented as terms. The size |t |of a tree t is the number of nodes of t. Here is an example: Example 35 Let f be a symbol of rank 2, h a symbol of rank 1, and a a symbol of rank 0 (a constant). Then the term h(f(h(f(h(h(a)), a)), h(f(h(h(a)), a)))) corresponds to the node-labelled tree of size 14, shown in Figure 4. A tree straight-line program (TSLP for short and also called SLCF tree grammar in [90, 92]) over the terminal alphabet Σ (which is a ranked alphabet in the above sense) is a tuple A = (N,Σ, S, P ), such that N is a finite set of ranked symbols (the nonterminals), S ∈ N has rank 0 (the initial nonterminal) and P is a finite set of productions of the form A(y1, . . . , yn) → t where A is a nonterminal of rank n and t is a tree built up from the ranked symbols in Σ ∪ N and the parameters y1, . . . , yn which are considered as symbols of rank 0 (i.e., constants). Moreover, as for ordinary SLPs, it is required that every nonterminal occurs on the left-hand side of exactly one production... |

4 | A PTIME-complete matching problem for SLP-compressed words.
- Markey, Schnoebelen
- 2004
(Show Context)
Citation Context ...m. The idea is to compute for every position 1 ≤ i ≤ m and every variable X of T the length ℓ(i,X) of the longest prefix of ai · · · am that is a subsequence of evalT(X). The values ℓ(i,X) satisfy a simple recursion, which allows to compute all these values in time O(|p |· |T|). More complex problems related to semi-compressed subsequence matching (e.g. counting the number of shortest factors of eval(T), which contain p as a subsequence) are considered in [22, 134, 142]. The structural complexity of a generalization of semi-compressed subsequence matching was studied by Markey and Schnoebelen [100]. LOGCFL is the class of all languages that are logspace reducible to a context-free language [132]; it coincides with the class of all languages that can be recognized by a uniform boolean circuit family of polynomial size and logarithmic depth, where all ∧-gates have bounded fan-in and all ∨-gates have unbounded fan-in [137]. It is conjectured that LOGCFL is a proper subclass of P. Theorem 16 ([100]) The following problem belongs to LOGCFL: input: Strings p0, p1 . . . , pn ∈ Σ ∗ and an SLP T over Σ question: Does eval(T) ∈ p0Σ ∗p1Σ ∗ · · · pn−1Σ ∗pn hold? 9 Compressed membership problems The... |

3 | On the complexity of optimal grammar-based compression.
- Arpe, Reischuk
- 2006
(Show Context)
Citation Context ...[23]) Unless P = NP there is no polynomial time algorithm with the following specification: • The input consists of a string w over some alphabet Σ. • The output is a grammar A such that eval(A) = w and |A |≤ 8569/8568 · opt(w). Theorem 7 is shown in [23] using a reduction from the vertex cover problem for graphs with maximal degree 3. In Theorem 7 it is important that the alphabet Σ is not fixed. It is an open problem whether there is a polynomial time algorithm that computes for a binary string w ∈ {0, 1}∗ an SLP A such that eval(A) = w and |A |= opt(w). Some partial results can be found in [8]. The best known upper bound for grammar-based compression is provided by the following results that was shown independently by Charikar et al. [23] and Rytter [119]: Theorem 8 ([23, 119]) There is an O(log(|Σ|) · n)-time algorithm that computes for a given word w ∈ Σ∗ of length n an SLP A such that eval(A) = w and |A |≤ O(log(n/opt(w)) · opt(w)) (i.e., the approximation ratio of the algorithm is O(log(n/opt(w)))). 6 The algorithms of Charikar et al. and Rytter follow a similar strategy. First, the LZ77-encoding of the input string w is computed. This is possible in time O(log(|Σ|) · |w|) usin... |

3 |
The word problem and power problem in 1-relator groups are primitive recursive.
- Cannonito, Gatterdam
- 1975
(Show Context)
Citation Context ...ural classes of groups with decidable word problems were found. Prominent examples are for instance finitely generated linear groups, automatic groups [36], and one-relator groups. With the advent of computational complexity theory, also the complexity of word problems became an active research area. For instance, it was shown that for a finitely generated linear group the word problem can be solved in logarithmic space [84, 130], that automatic groups have polynomial time solvable (in fact quadratic) word problems [36], and that the word problem for a one-relator group is primitive recursive [19]. For a group G let Aut(G) be the automorphism group of G. In general, Aut(G) is not necessarily finitely generated even if G is finitely generated. For the investigation of the complexity of the word problem of a finitely generated subgroup of Aut(G), a compressed variant of the word problem for G turned out to be useful. Assume again that G is finitely generated by Σ. The compressed word problem of G with respect to Σ is the following computational problem: input: An SLP A over the alphabet Σ ∪ Σ−1 question: Does eval(A) = 1 hold in G? It is easy to see that also for the compressed word prob... |

3 | One-context unification with STG-compressed terms is in NP.
- Creus, Godoy
- 2012
(Show Context)
Citation Context ...(A) [90]. • It is P-complete to check for a given linear TSLP A and a given tree automaton A, whether A accepts eval(A) [92]. The polynomial time algorithm in the linear case works in two steps: (1) Transform in polynomial time the input TSLP A into a TSLP B such that eval(A) = eval(B) and every nonterminal of B has rank at most one. This is the difficult step in [92]. (2) Evaluate the input tree automaton A on eval(B) in polynomial time, using the fact that every nonterminal of B has rank at most one, see [90]. Several other algorithmic problems for TSLP-compressed input trees are studied in [20, 41, 80, 127, 128]. 15.2 Pictures A picture over the alphabet Σ is a mapping p : {1, . . . , n} × {1, . . . ,m} → Σ for some n,m ≥ 1. We say that (n,m) is the dimension of the picture p. For more information on pictures see [47]. 29 A picture can be viewed as a 2-dimensional word. We define two partial concatenation operations for pictures. Given two pictures p1 and p2, where p1 has dimension (n,m) and p2 has dimension (n, k), we define the horizontal concatenation (p1, p2) : {1, . . . , n}×{1, . . . ,m+ k} → Σ (a picture of dimension (n,m + k)) as follows, where x ∈ {1, . . . , n} and y ∈ {1, . . . ,m + k}: (p... |

3 | Data structure lower bounds on random access to grammarcompressed strings.
- Chen, Verbin, et al.
- 2012
(Show Context)
Citation Context ...the resulting balanced SLP is O(|A |· log(n)). Hence, the extra space increases from Θ(|A|) to O(|A |· log(n)). The following result from [15] combines a query time of O(log(n)) with a preprocessing time and extra space of O(|A|): Theorem 25 ([15]) From a given SLP A such that n = |eval(A) |one can compute on a RAM in (preprocessing) time O(|A|) a data structure of size O(|A|) that allows to compute in (query) time O(log(n)) the symbol eval(A)[i] for a given binary coded number 1 ≤ i ≤ n. Some lower bounds on the preprocessing and query time for the compressed querying problem can be found in [24]. 11 Compressed word problems for groups In recent years, techniques for SLP-compressed strings were successfully applied in order to get more efficient algorithms for problems in combinatorial group theory. In this section, we briefly survey some of these results. We will restrict our consideration to the compressed word problem and its application to the word problem for automorphism groups. Background in combinatorial group theory can be found for instance in [96, 131]. Let G be a finitely generated group, and let Σ be a finite generating set for G. This means that every element of G can be... |

3 | Efficient algorithms for highly compressed data: The word problem in Higman’s group is in P.
- Diekert, Laun, et al.
- 2012
(Show Context)
Citation Context ...ynomial time hierarchy. It can be defined as the class of all languages L ⊆ Σ∗ such that there exists a polynomial time set P ⊆ Σ∗ × {0, 1}∗ × {0, 1}∗ and a polynomial p(x) such that for all u ∈ Σ∗ we have ({0, 1}≤m denotes the set of binary words of length at most m): u ∈ L ⇐⇒ ∀v ∈ {0, 1}≤p(|u|) ∃w ∈ {0, 1}≤p(|u|) : (u, v, w, ) ∈ P The class Πp2 contains both NP and coNP. Recently, power circuits [107], which are a compression scheme for representing huge integers (of multiply exponential size), have been used for solving the word problem for the one-relator Baumslag group and Higman’s group [34]. 6A group G is fully residually free, if for every tuple (g1, . . . gn) of elements from G\{1} there exists a free group F and a homomorphism ϕ : G → F such that ϕ(gi) 6= 1 for all 1 ≤ i ≤ n. 23 12 Word equations Let Σ be a finite alphabet and X be a finite set of variables. A word equation over (Σ,X ) is a pair (u, v) with u, v ∈ (Σ ∪ X )∗. For a mapping σ : X → Σ∗ let σ : (Σ ∪ X )∗ → Σ∗ be the unique homomorphism with σ(a) = a for a ∈ Σ and σ(x) = σ(x). A solution of the word equation (u, v) is a mapping σ : X → Σ∗ such that σ(u) = σ(v). In the following, we write a word equation (u, v... |

2 |
The inclusion problem of context-free languages: Some tractable cases.
- Bertoni, Choffrut, et al.
- 2011
(Show Context)
Citation Context ...more efficient algorithm was presented. In both papers, checking bisimulation equivalence of normed context-free processes is reduced to checking equality of SLPcompressed words. In [52], Theorem 9 is applied in the area of program analysis. It is shown that 7The geometric intersection number of α and β is the minimum of |α′ ∩ β′|, where α′ (resp., β′) is isotopic to α (resp., β). 26 the so called interprocedural global value numbering problem for unary uninterpreted function symbols can be solved in polynomial time. Applications of Theorem 9 for general context-free languages can be found in [13, 136]. In [136] it shown that one can decide in polynomial time whether a context-free grammar with terminal alphabet Σ ∪ Σ (where Σ = {a |a ∈ Σ} is a disjoint copy of Σ) only generates well-bracketed strings. Here we view a as the closing bracket of a, so for instance abbcca is well-bracketed but ab is not. In [13], this result is generalized as follows. Assume that R is a terminating and confluent string rewriting system over an alphabet Σ such that the presented monoid Σ∗/R is cancellative. Then by [13] one can decide in polynomial time for a given context-free grammar G whether L(G) is containe... |

2 | Compressed word problems in HNN-extensions and amalgamated products.
- Haubold, Lohrey
- 2011
(Show Context)
Citation Context ...t makes sense to design algorithms that directly operate on the compressed string representation in order to save the time and space for (de)compression. Such a scenario can be found for instance in large genom databases or XML processing. • Large and often highly compressible strings may appear as intermediate data structures in algorithms. Here, one may try to store a compressed representation of these intermediate data structures and to process this representation. This may lead to more efficient algorithms. Examples for this strategy can be found for instance in combinatorial group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algori... |

2 | Unified compression-based acceleration of edit-distance computation.
- Hermelin, Landau, et al.
- 2010
(Show Context)
Citation Context ... group theory [57, 58, 95, 97, 125], computational topology [37, 122, 124], interprocedural analysis [52], and bisimulation checking [61, 79]. • In some situations it makes sense to compute in a first phase a compressed representation of an input string, which makes regularities in the string explicit. These regularities may be exploited in a second phase for speeding up an algorithm. This principle is known as acceleration by compression. It was recently applied in order to speed up the Viterbi algorithm for analyzing hidden Markov models [83] as well as speeding up edit distance computation [31, 59]. When we talk about algorithms on compressed strings, we have to make precise the compressed representation we want to use. Such a representation should have two properties: (i) it should cover many compression schemes from practice and (ii) it should be mathematically easy to handle. Straight-line programs (SLPs) have both of these properties. An SLP is a context-free grammar that generates exactly one word. In an SLP, repeated subpatterns in a string have to be represented only once by introducing a non-terminal for the pattern. It is not difficult to see that with an SLP 1 consisting of n ... |

2 | Isomorphism of regular trees and words.
- Lohrey, Mathissen
- 2011
(Show Context)
Citation Context ...ammar with terminal alphabet Σ ∪ Σ (where Σ = {a |a ∈ Σ} is a disjoint copy of Σ) only generates well-bracketed strings. Here we view a as the closing bracket of a, so for instance abbcca is well-bracketed but ab is not. In [13], this result is generalized as follows. Assume that R is a terminating and confluent string rewriting system over an alphabet Σ such that the presented monoid Σ∗/R is cancellative. Then by [13] one can decide in polynomial time for a given context-free grammar G whether L(G) is contained in the set of all strings over Σ that can be reduced with R to the empty word. In [94], Theorem 9 is used in order to solve the isomorphism problem for regular words (a particular class of colored linear orders) in polynomial time. A regular word is given by a set of equations {Xi := wi |1 ≤ i ≤ n}, where every right-hand side wi is a word over the variables Xi and terminal letters. This is an SLP, where we do not require condition (2) (acyclicity) from Definition 1. Starting from the distinguished variable X1, we obtain an infinite derivation tree, where every leaf is labelled with a terminal symbol. The generated regular word is obtained by taking the natural left-to-right or... |

1 | Dynamic pattern matching.
- Alstrup, Brodal, et al.
- 1998
(Show Context)
Citation Context ...s. A signature corresponds to 7 a nonterminal of an SLP. The signature of a string is computed by iteratively breaking up the sequence into small blocks, which are encoded by integers using a pairing function. Adding the concatenation xy of two string x and y to the data structure needs time O(log n(log m log∗ m + log n)) for the mth operation, where n is the length of the resulting string (hence, log(n) ≤ m). This leads to a cubic time algorithm for checking equality of SLP-compressed strings. An improvement of the data structure from [105] was obtained by Alstrup, Brodal, and Rauhe [5]; see [4] for a long version. There, the complexity of adding the concatenation xy of two string x and y to the data structure is improved to O(log n(log∗ m + log |Σ|)), where n and m have the same meaning as above and Σ is the underlying alphabet. This leads to the complexity of O(n2 log∗ n) (where n = |A |+ |B|) for checking equality of SLP-compressed strings, which is currently the best upper bound. One can view the algorithms from [5, 105] as efficient methods for transforming an SLP A into a canonical SLP A such that eval(A) = eval(B) if and only if A′ and B′ are identical. One should note that th... |

1 |
Straight-line programs: A practical test.
- Burmistrov, Khvorost
- 2011
(Show Context)
Citation Context ...m that computes from a given string w an SLP A for w of size o(log(|w|)/ log log(|w|))) · opt(w) would imply a major progress on the problem of computing shortest addition chains, see [23]. In this problem, a sequence k1, . . . , km of binary encoded natural numbers is given, and one is looking for a smallest circuit over (N,+) such that every ki is the value of some gate. Currently the best approximation algorithm for this problem is by Yao [143] (from 1976) and its approximation ratio is O(log n/ log log n), where n = k1 + · · · + km. Some optimizations of Rytter’s algorithm can be found in [17]. Another grammar-based compression algorithm with linear running time (for a fixed alphabet) and an approximation ratio of O(log(n/opt(w))) was presented by Sakamoto [121]. His algorithm is based on the RePair compressor [78], which is another popular grammar-based compressor. The algorithms from [23, 119, 121] all use space Ω(n) in order to generate an SLP for a string of length n, which might be a problem for large input texts. A grammar-based compression algorithm with an approximation ratio of O(log2(n)) that runs in linear time and needs no more space than the size of the output grammar ... |

1 |
Virginia vassilevska williams.
- Coppersmith-Winograd
- 2012
(Show Context)
Citation Context ...e is a 1-entry at a position (i, j), where i is an initial state of A and j is a final state of A. In this way, we can check whether eval(A) ∈ L(A) using |A |many multiplications of boolean (n × n)-matrices, which leads to the time bound O(|A |· n3). Actually, by using a better algorithm for boolean matrix multiplication than the standard n3 school method, the time bound O(|A |· n3) can be improved. For about 20 years, the Coppersmith–Winograd algorithm was the asymptotically best algorithm for matrix multiplication with a complexity of O(n2,3755). This bound was recently improved by Williams [29] to O(n2,3727). For a given deterministic finite automaton A with n states and an SLP A, one can verify eval(A) ∈ L(A) in time O(|A |· n) [115]. Instead of |A |many boolean matrix multiplications, only |A |compositions of functions from {1, . . . , n} to {1, . . . , n} are necessary in the deterministic case. In [12] the following generalization has been shown: From a given SLP A over an alphabet Σ and a deterministic finite state transducer τ with input alphabet Σ and n states, one can compute in time O(|A |· n) an SLP B of size O(|A |· n) for the output string τ(eval(A)). The second statemen... |

1 | Speeding-up $q$-gram mining on grammarbased compressed texts.
- Goto, Bannai, et al.
- 2012
(Show Context)
Citation Context ...f-index is to store an SLP A with small space overhead in such a form that for a given uncompressed pattern p one can quickly list all occurrences of p in eval(A). Bille et al. [15] present an efficient algorithm for approximative compressed pattern matching for SLP-compressed strings: 10 Here, for a given pattern p, SLP T, and k ∈ N the goal is to find all occurrences of factors of eval(T) that have distance at most k from p with respect to a certain distance measure (e.g. Hamming distance or edit distance). The problem of computing q-gram frequencies for SLP-compressed strings is studied in [49, 50]. A q-gram is just a string of length q. An algorithm that computes in time O(q · n) for a Chomsky normal form SLP A of size n a list with the frequencies of all q-grams occurring in eval(A) is presented in [49]. This algorithm can be seen as a refinement of the algorithm for semi-compressed pattern matching outlined above. In [103], it is shown that the length of the longest common factor of two SLP-compressed strings can be computed in polynomial time. 7 Leaf languages and string compression In this section, we discuss a technique that turned out to be useful for deriving lower complexity bo... |

1 | Compressed conjugacy and the word problem for outer automorphism groups of graph groups.
- Haubold, Lohrey, et al.
- 2010
(Show Context)
Citation Context |

1 | Compressed membership for NFA (DFA) with compressed labels is in NP (P).
- Jez
- 2012
(Show Context)
Citation Context ...pn = q, and w = eval(A0) · · · eval(An−1). The language L(A) ⊆ Σ ∗ is the set of all words that label a path from the initial state q0 to some final state qf ∈ F . A compressed deterministic finite automaton (CDFA for short) is a CNFA A = (Q,Σ, δ, q0, F ) such that for every state p ∈ Q, and all transitions (p, A1, q1), (p, A2, q2) ∈ δ, neither eval(A1) is a prefix of eval(A2), nor eval(A2) is a prefix of eval(A1) The compressed membership problem for CNFAs (resp., CDFAs) is the following decision problem: input: A CNFA (resp., CDFA) A and an SLP A question: Does eval(A) ∈ L hold? Theorem 20 ([66]) The compressed membership problem for CNFAs is NP-complete, whereas the compressed membership problem for CDFAs is P-complete. In his proof of Theorem 20, Jez developed his recompression technique that he latter applied in [67] to get his O((|T|+ |P|) log(M) log(|T|+ |P|))-algorithm for fully compressed pattern matching, see Section 6. The order of a word u is the largest number n such that u can be written as u = vnw, where w is a prefix of v. If A1, . . . , An is a list of all SLPs that occur as labels in the CNFA A, then we set ord(A) = max{ord(eval(Ai)) |1 ≤ i ≤ n}. Note that ord(A) is i... |

1 | Compressed word problems for inverse monoids.
- Lohrey
- 2011
(Show Context)
Citation Context ...ime) evaluates to the zero polynomial. This problem is known as algebraic identity testing. It belongs to coRP by [63]. Whether algebraic identity testing belongs to P is a major open problem in complexity theory, see [1] for a survey. It also makes sense to consider the compressed word problem for a finitely generated monoid M . In that case, the input consists of two SLPs A and B over a generating set of M , and it is asked, whether eval(A) = eval(B) in M . For instance, Theorem 9 says that the compressed word problem for a finitely generated free monoid can be solved in polynomial time. In [87], it was shown that the compressed word problem for a finitely generated free inverse monoid of rank r ≥ 2 is complete for Π2p, which is the universal second level of the polynomial time hierarchy. It can be defined as the class of all languages L ⊆ Σ∗ such that there exists a polynomial time set P ⊆ Σ∗ × {0, 1}∗ × {0, 1}∗ and a polynomial p(x) such that for all u ∈ Σ∗ we have ({0, 1}≤m denotes the set of binary words of length at most m): u ∈ L ⇐⇒ ∀v ∈ {0, 1}≤p(|u|) ∃w ∈ {0, 1}≤p(|u|) : (u, v, w, ) ∈ P The class Πp2 contains both NP and coNP. Recently, power circuits [107], which are a compre... |