CS561 Web Data Management
Projects
Spring 2013
Professor: Vassilis Christophides
Teaching Assistant: Michalis Chortis
E-mails: {christop, mhortis}@ics.forth.gr
Course Hours: Tuesday 3-5PM and Wednesday 11-1PM
Room: H.204
Office Hours: After the lectures or by appointment
Course Credits: 4
[Home]
[Lectures]
[Instructional Material]
[Software and Tools]
[Programming Assignments]
[Projects]
[Grades]
Project Description -
Project Papers -
Project Assignments -
Schedule of Presentations
This is a one or two person comprehensive survey project in which you perform an in-depth analysis of research literature in the area covered by the course. The key to the success of this project is your creativity and dedication. Specifically, you need to do the following:
- Determine a research topic for your project according to the list of papers in the project papers section of course's web page.
- Each group of students should choose different papers and every student should choose at least 2 papers.
-
Read a sufficient number of papers (usually more than the papers you are going to present) in order to perform an in-depth analysis of the research described in the papers. Specifically, you need to focus on the following:
-
Definition of criteria on which published results can be classified. These may include the types of problems to be solved, the types of methods used, the types of frameworks/architectures, etc.
-
Comprehensive and systematic description of the technical aspect of the papers. This includes the concept, ideas, algorithms, methodologies, experimental results etc.
-
Put a particular emphasis on your personal critique to the paper's material.
-
Give a 30 to 45-minute presentation in class, whose organization should be also followed in your report. [Presentation Guidelines]
-
Write a technical report including all items mentioned above. [Guidelines for the presentation of written work]
Requirements
-
A survey must analyze a good number (minimum 2 per student) of papers related to the selected topic. The survey report and the presentation will be evaluated on both its breadth (i.e., how complete the coverage is) and its depth (i.e., how much insight it brings out). For the grading of the presentation there are many aspects that would be taken into consideration. In details, as far as the understanding of the paper (12%) the grading will be as follows:
-
8% technique/approach,
-
2% background and
-
2% shortcomings/open problems
About delivering your talk (8%) the grading will be as follows:
-
3% slides,
-
3% speech and
-
2% session of questions and problems.
-
The report must have the following sections: Abstract (up to 250 words), Introduction, the main technical sections, Conclusion / Contributions (according to the related work) and Bibliographic References. Basically, you should follow the structure of research papers such as those you have read.
-
Both report and presentation should address several or possibly all of the following issues:
-
What is the most important point of each one of the papers?
-
Why the work is notable or novel or neither?
-
Why the problems tackled by the paper are or are not important?
-
Why is the proposed solution potentially useful or not useful?
-
Are the assumptions clearly specified and are they reasonable and practically valid?
-
Point out additional contexts in which the same idea or technology could be applied; relate the work to another paper that you find during your literature search.
-
How the proposed ideas are evaluated and how thorough is that evaluation?
-
Identify a list of possible future research tasks to make the proposed work even better, develop a different solution strategy, or to drop some of the given assumptions, and so on.
-
The length of the paper should be somewhere between 15 and 30 pages.
What to hand in
-
The electronic version of your presentation and printed handouts (before the presentation in the classroom).
-
The electronic version and a hard copy of your report.
Hint
It is suggested that when you study the papers, you would make a list of the points that you find particularly confusing, ambiguous, interesting, controversial, etc., and try to formulate your own comments, possible answers, and examples to address those points. These points and related materials can be a part of your report. In general, you may be asked to address those points in class during your presentation. Thus your critiques and other relevant information should be in your mind when you arrive in class.
Advice on research and writing
[Data Integration in the Web of Data]
[Web Data Storage and Access]
[Benchmarking]
- Crawling
-
Sindice.com: Weaving the Open Linked Data
Giovanni Tummarello, Renaud Delbru, Eyal Oren, ISWC/ASWC 2007
-
LDSpider - An open-source crawling framework for the Web of Linked Data
Robert Isele, Jurgen Umbrich, Christian Bizer, Andreas Harth, ISWC Posters&Demos 2010
-
Semantic Navigation on the Web of Data- Speci?cation of Routes, Web Fragments and Actions
Valeria Fionda, Claudio Gutierrez, Giuseppe Pirro, WWW 2012
-
MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data
Andreas Harth, Jurgen Umbrich, Stefan Decker, ISWC 2006
-
Searching Semantic Web Objects Based on Class Hierarchies
Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu, LDOW 2008
-
Swoogle: A Search and Metadata Engine for the Semantic Web
Li Ding, Timothy W. Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng, Pavan Reddivari, Vishal Doshi, Joel Sachs, CIKM 2004
-
Effective Page Refresh Policies For Web Crawlers
Junghoo Cho, Hector Garcia-Molina, ACM Trans. Database Syst. 2003
- SPARQL Federation and Query Mediation
-
Efficient Distributed Query Processing for Autonomous RDF Databases
Fabian Prasser, Alfons Kemper, Klaus A. Kuhn, EDBT 2012
-
Linked Data Query Processing Strategies
Gunter Ladwig, Thanh Tran, ISWC 2010
-
Pay-as-you-go Data Integration for Linked Data: opportunities, challenges and architectures
Norman W. Paton, Klitos Christodoulou, Alvaro A. A. Fernandes, Bijan Parsia, Cornelia Hedeler, SWIM 2012
-
Rewriting Queries on SPARQL Views
Wangchao Le, Songyun Duan, Anastasios Kementsietsidis, Feifei Li, Min Wang, WWW 2011
-
SPARQL-RW: Transparent Query Access over Mapped RDF Data Sources
Konstantinos Makris, Nikos Bikakis, Nektarios Gioldasis, Stavros Christodoulakis, EDBT 2012
-
Efficient Query Answering against Dynamic RDF Databases
Francois Goasdoue, Ioana Manolescu, Alexandra Roatis, EDBT 2013
- Data Mapping and Entity Resolution
-
A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces
George Papadakis, Ekaterini Ioannou, Themis Palpanas, Claudia Niederee, Wolfgang Nejdl, IEEE TKDE 2012
-
To Compare or Not to Compare- Making Entity Resolution more Efficient
George Papadakis, Ekaterini Ioannou, Claudia Niederee, Themis Palpanas, Wolfgang Nejdl, SWIM 2011
- Data Quality and Provenance
- RDF Data Exchange
- SPARQL Engines and Optimization
-
Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins
Thomas Neumann, Guido Moerkotte, ICDE 2011
-
Database foundations for scalable RDF processing
Katja Hose, Ralf Schenkel, Martin Theobald, Gerhard Weikum, Reasoning Web 2011
-
Heuristics-based Query Optimisation for SPARQL
Petros Tsialiamanis, Lefteris Sidirourgos, Irini Fundulaki, Vassilis Christophides, Peter A. Boncz, EDBT 2012
-
RDF3X: a RISCstyle Engine for RDF
Thomas Neumann, Gerhard Weikum, PVLDB 2008
-
Scalable Join Processing on Very Large RDF Graphs
Thomas Neumann, Gerhard Weikum, SIGMOD Conference 2009
-
Storing and Indexing Massive RDF Data Sets
Yongming Luo, Francois Picalausa, George H.L. Fletcher, Jan Hidders, Stijn Vansummeren, Semantic Search over the Web 2012
-
gStore: Answering SPARQL Queries via Subgraph Matching
Lei Zou, Jinghui Mo, Lei Chen 0002, M. Tamer Ozsu, Dongyan Zhao, PVLDB 2011
-
Static Analysis and Optimization of Semantic Web Queries
Andres Letelier, Jorge Perez 0001, Reinhard Pichler, Sebastian Skritek, PODS 2012
- Continuous Query Processing
-
An Execution Environment for C-SPARQL Queries
Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Michael Grossniklaus, EDBT 2010
-
Linked Stream Data Processing
Danh Le Phuoc, Josiane Xavier Parreira, Manfred Hauswirth, Reasoning Web 2012
-
Querying RDF Streams with C-SPARQL
Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus,
SIGMOD Record 2010
- Multi-Query Processing
- Large Scale Data Management
-
From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra
HyeongSik Kim, Padmashree Ravindra, Kemafor Anyanwu, PVLDB 2011
-
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing
Mohammad Farhan Husain, James P. McGlothlin, Mohammad M. Masud, Latifur R. Khan, Bhavani M. Thuraisingham,
IEEE Transactions on Knowledge and Data Engineering 2011
-
Large-scale Linked Data Processing - Cloud Computing to the Rescue?
Michael Hausenblas, Robert Grossman, Andreas Harth, Philippe Cudre-Mauroux, CLOSER 2012
-
RDF Data Management in the Amazon Cloud
Francesca Bugiotti, Francois Goasdoue, Zoi Kaoudi, Ioana Manolescu, EDBT/ICDT Workshops 2012
-
Scalable SPARQL Querying of Large RDF Graphs
Jiewen Huang, Daniel J. Abadi, Kun Ren, PVLDB 2011
-
CumulusRDF: Linked Data Management on Nested Key-Value Stores
Gunter Ladwig, Andreas Harth, SSWS 2011
-
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools
Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu, Bhavani M. Thuraisingham, IEEE CLOUD 2010
-
Efficient Processing of RDF Graph Pattern Matching on MapReduce Platforms
Padmashree Ravindra, Seokyong Hong, HyeongSik Kim, Kemafor Anyanwu, DataCloud-SC 2011
-
PigSPARQL: Mapping SPARQL to Pig Latin
Alexander Schatzle, Martin Przyjaciel-Zablocki, Georg Lausen, SWIM 2011
-
Rya: A Scalable RDF Triple Store for the Clouds
Roshan Punnoose, Adina Crainiceanu, David Rapp, Cloud-I 2012
-
RDFPath: Path Query Processing on Large RDF Graphs with MapReduce
Martin Przyjaciel-Zablocki, Alexander Schatzle, Thomas Hornung, Georg Lausen, ESWC Workshops 2011
-
Towards Effective Partition Management for Large Graphs
Shengqi Yang, Xifeng Yan, Bo Zong, Arijit Khan, SIGMOD Conference 2012
- Keyword-based and Top-K Querying
-
Efficient Execution of Top-K SPARQL Queries
Sara Magliacane, Alessandro Bozzon, Emanuele Della Valle, ISWC 2012
-
Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends
Andre Freitas, Edward Curry, Joao Gabriel Oliveira, Sean O'Riain, IEEE Internet Computing 2012
-
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data
Thanh Tran, Haofen Wang, Sebastian Rudolph, Philipp Cimiano, ICDE 2009
-
Natural Language Questions for the Web of Data
Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, Gerhard Weikum, EMNLP-CoNLL 2012
-
SP^2Bench: A SPARQL Performance Benchmark
Michael Schmidt, Thomas Hornung, Georg Lausen, Christoph Pinkel, ICDE 2009
-
An Evaluation of Approaches to Federated Query Processing over Linked Data
Peter Haase, Tobias Matha, Michael Ziller, I-SEMANTICS 2010
-
Apples and Oranges: A Comparison of RDF Benchmarks and Real RDF Datasets
Songyun Duan, Anastasios Kementsietsidis, Kavitha Srinivas, Octavian Udrea, SIGMOD Conference 2011
-
Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough?
Gabriela Montoya, Maria-Esther Vidal, Oscar Corcho, Edna Ruckhaus, Carlos Buil Aranda, ISWC 2012
-
D1.2 Benchmarking RDF Storage Engines
Ying Zhang, Pham Minh Duc, Fabian Groffen, Erietta Liarou, Peter Boncz, Martin Kersten, Jean Paul Calbimonte, Oscar Corcho, 2012
-
Column-Store Support for RDF Data Management: Not All Swans Are White
Lefteris Sidirourgos, Romulo Goncalves, Martin L. Kersten, Niels Nes, Stefan Manegold, PVLDB 2008
-
MonetDB Release with Optimized Graph Path Processing
Collaborative Project, 2012
-
On Generating Benchmark Data for Entity Matching
Ekaterini Ioannou, Nataliya Rassadko, Yannis Velegrakis, Journal on Data Semantics 2012
-
An Empirical Study of Real-World SPARQL Queries
Mario Arias, Javier D. Fernandez, Miguel A. Martinez-Prieto, Pablo de la Fuente, CoRR 2011
-
FedBench: A Benchmark Suite for Federated Semantic Data Query Processing
Michael Schmidt, Olaf Gorlitz, Peter Haase, Gunter Ladwig, Andreas Schwarte, Thanh Tran, ISWC 2011