Normalization of edit sequences for text synchronization

Rafael C. Carrasco; Alexander Sánchez Díaz

RAIRO - Theoretical Informatics and Applications (2011)

  • Volume: 45, Issue: 2, page 235-248
  • ISSN: 0988-3754

Abstract

top
It often occurs that local copies of a text are modified by users but that the local modifications are not synchronized (thus allowing the merged text to become the source for later editions) until later when, for instance the network connection is reestablished. Since text editions usually affect a small fraction of the whole content, the history of edit operations provides a compact representation of the modified file. In this paper, we define a normal form for these records which will permit for the comparison of all text files that have been obtained by editing a common source S when the difference between each output file Oi and the source file is given as a sequence Li of edit operations. We show that the normalized sequence is unique for all the equivalent text editions and provide efficient procedures with which to compute this normal form and to obtain the edit sequence LM transforming S into a merged file M which integrates all the local modifications. We also discuss how these normalization can be integrated into the operational transformation paradigm for optimistic replication.

How to cite

top

Carrasco, Rafael C., and Sánchez Díaz, Alexander. "Normalization of edit sequences for text synchronization." RAIRO - Theoretical Informatics and Applications 45.2 (2011): 235-248. <http://eudml.org/doc/276340>.

@article{Carrasco2011,
abstract = { It often occurs that local copies of a text are modified by users but that the local modifications are not synchronized (thus allowing the merged text to become the source for later editions) until later when, for instance the network connection is reestablished. Since text editions usually affect a small fraction of the whole content, the history of edit operations provides a compact representation of the modified file. In this paper, we define a normal form for these records which will permit for the comparison of all text files that have been obtained by editing a common source S when the difference between each output file Oi and the source file is given as a sequence Li of edit operations. We show that the normalized sequence is unique for all the equivalent text editions and provide efficient procedures with which to compute this normal form and to obtain the edit sequence LM transforming S into a merged file M which integrates all the local modifications. We also discuss how these normalization can be integrated into the operational transformation paradigm for optimistic replication. },
author = {Carrasco, Rafael C., Sánchez Díaz, Alexander},
journal = {RAIRO - Theoretical Informatics and Applications},
keywords = {Edit distance; text synchronization; reconciliation of replicas; edit distance},
language = {eng},
month = {6},
number = {2},
pages = {235-248},
publisher = {EDP Sciences},
title = {Normalization of edit sequences for text synchronization},
url = {http://eudml.org/doc/276340},
volume = {45},
year = {2011},
}

TY - JOUR
AU - Carrasco, Rafael C.
AU - Sánchez Díaz, Alexander
TI - Normalization of edit sequences for text synchronization
JO - RAIRO - Theoretical Informatics and Applications
DA - 2011/6//
PB - EDP Sciences
VL - 45
IS - 2
SP - 235
EP - 248
AB - It often occurs that local copies of a text are modified by users but that the local modifications are not synchronized (thus allowing the merged text to become the source for later editions) until later when, for instance the network connection is reestablished. Since text editions usually affect a small fraction of the whole content, the history of edit operations provides a compact representation of the modified file. In this paper, we define a normal form for these records which will permit for the comparison of all text files that have been obtained by editing a common source S when the difference between each output file Oi and the source file is given as a sequence Li of edit operations. We show that the normalized sequence is unique for all the equivalent text editions and provide efficient procedures with which to compute this normal form and to obtain the edit sequence LM transforming S into a merged file M which integrates all the local modifications. We also discuss how these normalization can be integrated into the operational transformation paradigm for optimistic replication.
LA - eng
KW - Edit distance; text synchronization; reconciliation of replicas; edit distance
UR - http://eudml.org/doc/276340
ER -

References

top
  1. S. Balasubramaniam and B.C. Pierce, What is a file synchronizer, in Forth Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom'98) (1998) 98–1085.  
  2. P. Bille, A survey on tree edit distance and related problems. Theoret. Comput. Sci.337 (2005) 217–239.  
  3. T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to algorithms. 6th edition, MIT Press and McGraw-Hill Book Company (1992).  
  4. R. Cox and W. Josephson, File synchronization with vector time pairs. Technical Report MIT-CSAIL-TR-2005-014 and MIT-LCS-TM-650, MIT Computer Science and Artificial Intelligence Laboratory (2005).  
  5. C.A. Ellis and S.J. Gibbs, Concurrency control in groupware systems, in Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data. J. Clifford, B.G. Lindsay and D. Maier, Eds. ACM Press (1989) 399–407.  
  6. J.N. Foster, M.B. Greenwald, C. Kirkegaard, B.C. Pierce and A. Schmitt, Exploiting schemas in data synchronization. J. Comput. System Sci.73 (2007) 669–689.  
  7. A. Imine, M. Rusinowitch, G. Oster and P. Molli, Formal design and verification of operational transformation algorithms for copies convergence. Theoret. Comput. Sci.351 (2006) 167–183.  
  8. A.-M. Kermarrec, A.I.T. Rowstron, M. Shapiro and P. Druschel, The IceCube approach to the reconciliation of divergent replicas, in PODC 2001, Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing. ACM (2001) 210–218.  
  9. S. Khanna, K. Kunal and B.C. Pierce, A formal investigation of diff3, in Foundations of Software Technology and Theoretical Computer Science (FSTTCS), Arvind and Prasad, Eds. (2007).  
  10. T. Lindholm, A three-way merge for XML documents, in DocEng '04: Proceedings of the 2004 ACM symposium on Document engineering. ACM, New York, NY, USA (2004) 1–10.  
  11. E. Lippe and N. van Oosterom, Operation-based merging. SIGSOFT Softw. Eng. Notes17 (1992) 78–87.  
  12. B. Lushman, and G.V. Cormack, Proof of correctness of Ressel's adOPTed algorithm. Inform. Process. Lett.86 (2003) 303–310.  
  13. W.J. Masek and M.S. Paterson, A faster algorithm computing string edit distances. J. Comput. System Sci.20 (1980) 18–31.  
  14. F. Mattern, Virtual time and global states of distributed systems, in Proc. Workshop on Parallel and Distributed Algorithms, M. Cosnard, Ed., Chateau de Bonas, France. Elsevier (1988) 215–226.  
  15. B.C. Pierce and J. Vouillon, What's in Unison? A formal specification and reference implementation of a file synchronizer. Technical Report MS-CIS-03-36, Dept. of Computer and Information Science, University of Pennsylvania (2004).  
  16. N. Ramsey and E. Csirmaz, An algebraic approach to file synchronization. SIGSOFT Softw. Eng. Notes26 (2001) 175–185.  
  17. M. Ressel, D. Nitsche-Ruhland and R. Gunzenhäuser, An integrating, transformation-oriented approach to concurrency control and undo in group editors, in CSCW '96, Proceedings of the ACM 1996 Conference on Computer Supported Cooperative Work. Boston, MA, USA, ACM (1996) 288–297.  
  18. Y. Saito and M. Shapiro, Optimistic replication. ACM Comput. Surv.37 (2005) 42–81.  
  19. H. Shen and C. Sun, A log compression algorithm for operation-based version control systems, in Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment, COMPSAC '02. IEEE Computer Society Washington, DC, USA (2002) 867–872.  
  20. T. Suel and N. Memon, Algorithms for delta compression and remote file synchronization, in Lossless Compression Handbook, K. Sayood, Ed. Academic Press (2003) 269–290.  
  21. C. Sun and C.A. Ellis, Operational transformation in real-time group editors: Issues, algorithms, and achievements, in CSCW98, Proceedings of the ACM 1998 Conference on Computer Supported Cooperative Work. ACM (1998) 59–68.  
  22. C. Sun, X. Jia, Y. Zhang, Y. Yang and D. Chen, Achieving convergence, causality preservation, and intention preservation in real-time cooperative editing systems. ACM Trans. Computer-Human Interaction5 (1998) 63–108.  
  23. W.F. Tichy, The string-to-string correction problem with block moves. ACM Trans. Comput. Syst.2 (1984) 309–321.  
  24. A. Tridgell and P. Mackerras, The rsync algorithm. Technical Report TR-CS-96-05, Department of Computer Science, Faculty of Engineering and Information Technology, The Australian National University (1996).  
  25. K. Zhang and D. Shasha, Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput.18 (1989) 1245–1262.  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.