Native xml databases vs. relational databases in dealing with xml documents
In the XML standard, data are represented as unranked labeled ordered trees. Regular unranked tree automata provide a useful formalism for the validation of schemas enforcing regular structural constraints on XML documents. However some concrete application contexts need the expression of more general constraints than the regular ones. In this paper we propose a new framework in which context-free style structural constraints can be expressed and validated. This framework is characterized by: (i)...
In the paper an interface is proposed that combines flexible (fuzzy) querying and data mining functionality. The point of departure is the fuzzy querying interface designed and implemented previously by the present authors. It makes it possible to formulate and execute, against a traditional (crisp) database, queries containing imprecisely specified conditions. Here we discuss possibilities to extend it with some data mining features. More specifically, linguistic summarization of data (databases),...
We prove the undecidability of Core XPath 1.0 (CXP) [G. Gottlob and C. Koch, in Proc. of 17th Ann. IEEE Symp. on Logic in Computer Science, LICS ’02 (Copenhagen, July 2002). IEEE CS Press (2002) 189–202.] extended with an Inflationary Fixed Point (IFP) operator. More specifically, we prove that the satisfiability problem of this language is undecidable. In fact, the fragment of CXP+IFP containing only the self and descendant axes is already undecidable.
Digital trees or tries are a general purpose flexible data structure that implements dictionaries built on words. The present paper is focussed on the average-case analysis of an important parameter of this tree-structure, i.e., the stack-size. The stack-size of a tree is the memory needed by a storage-optimal preorder traversal. The analysis is carried out under a general model in which words are produced by a source (in the information-theoretic sense) that emits symbols. Under some natural assumptions...
Digital trees or tries are a general purpose flexible data structure that implements dictionaries built on words. The present paper is focussed on the average-case analysis of an important parameter of this tree-structure, i.e., the stack-size. The stack-size of a tree is the memory needed by a storage-optimal preorder traversal. The analysis is carried out under a general model in which words are produced by a source (in the information-theoretic sense) that emits symbols. Under some natural...
This paper addresses two problems lying at the intersection of geometric analysis and theoretical computer science: The non-linear isomorphic Dvoretzky theorem and the design of good approximate distance oracles for large distortion.We introduce the notion of Ramsey partitions of a finite metric space, and show that the existence of good Ramsey partitions implies a solution to the metric Ramsey problem for large distortion (also known as the non-linear version of the isomorphic Dvoretzky theorem,...
Codd defined the relational algebra [E.F. Codd, Communications of the ACM 13 (1970) 377–387; E.F. Codd, Relational completeness of data base sublanguages, in Data Base Systems, R. Rustin, Ed., Prentice-Hall (1972) 65–98] as the algebra with operations projection, join, restriction, union and difference. His projection operator can drop, permute and repeat columns of a relation. This permuting and repeating of columns does not really add expressive power to the relational algebra. Indeed, using the...
Codd defined the relational algebra [E.F. Codd, Communications of the ACM13 (1970) 377–387; E.F. Codd, Relational completeness of data base sublanguages, in Data Base Systems, R. Rustin, Ed., Prentice-Hall (1972) 65–98] as the algebra with operations projection, join, restriction, union and difference. His projection operator can drop, permute and repeat columns of a relation. This permuting and repeating of columns does not really add expressive power to the relational algebra. Indeed, ...
A new information system representation, which inherently represents indiscernibility is presented. The basic structure of this representation is a Binary Decision Diagram. We offer testing results for converting large data sets into a Binary Decision Diagram Information System representation, and show how indiscernibility can be efficiently determined. Furthermore, a Binary Decision Diagram is used in place of a relative discernibility matrix to allow for more efficient determination of the discernibility...