Research Projects

Project Summaries - Ongoing Projects

  • New Methods and Algorithms to Estimate the Selectivity of SQL LIKE Queries

    Oct. 10, 2017 - April 10, 2019

    Funding Institution: TUBITAK

    Principal Investigator: Ali Çakmak

    Accurate cost and time estimation of a query is one of the major success indicators for database management systems. SQL allows to express flexible queries on text-formatted data. The LIKE operator is used to search for a specified pattern in a string database (e.g., name LIKE ‘es%’ predicate allows to search for people whose names start with ‘es’). It is vital to estimate the selectivity of such flexible predicates accurately for the query optimizer to choose an efficient execution plan. In this project, we will study the problem of estimating the selectivity of a LIKE query predicate over a bag of strings. We propose a new type of pattern-based histogram structure to summarize the data distribution in a particular column. More specifically, we will first mine sequential patterns over a given string database, and then construct a special histogram out of the mined patterns. During query optimization time, pattern-based histograms will be exploited to estimate the selectivity of a LIKE predicate. Besides, in this project, we will extend the existing sequence mining techniques to compute more specific sequence patterns and increase the selectivity estimation accuracy of the proposed framework. Orthogonal to the proposed techniques, as part of this project, we will question the value of the currently used metrics in the literature to compare different selectivity estimation methods. We argue that such methods are unnecessarily too specific, and may be misleading for the database practitioners and database management system developers. We will develop a new practical benefit-based metric, and provide an evaluation of the state of the art techniques as well as the methods that will be developed as part of this project. In this way, this project will provide a more realistic view of the alternative techniques to the researchers and developers in the field. The proposed techniques will be assessed and compared to the state of the art in different dimensions on real and synthetic string datasets which are freely available to the researchers. The methods and techniques that will be developed during this project will enable database managements systems to come up with more optimal query execution plans by using less resources. Hence, flexible queries on text-based data will take less time to execute.

  • Discovery of Genomic Factors that lead to High Grade Transformation in Human Cancers

    June 20, 2017 - June 20, 2019

    Funding Institution: TUBITAK

    Principal Investigator: Mehmet Baysan

    In this project, genomic factors that lead to increased aggression will be analyzed in human tumors, especially gliomas. It is essential to discover the genomic factors underlying high-grade transformation. Recent developments in genomic technologies resulted in the construction of large-scale high-resolution cancer data sets. In this study, first molecular subtypes which group similar samples for each cancer will be detected for selected large-scale data sets. Then, low and high-grade tumors will be compared for each subtype to identify the genes and pathways which are associated with tumor aggression. Finally, identified genomic factors for different subtypes will be compared to discover common factors which can be useful targets for clinical use.

  • Cloud-supported Adaptive Streaming of Modern Video

    Feb. 1, 2017 - Feb. 1, 2020

    Funding Institution: TUBITAK

    Principal Investigator: Shervin Shirmohammadi

    Video streaming has reached unprecedented levels: not only the global Internet traffic itself has been growing by 21% annually, but also up to 90% of this traffic will be video by 2018. Clearly, this creates a heavy burden for the infrastructure providing this service, and is the main reason that content delivery networks are now implemented in cloud-based infrastructure, with the assumption that the cloud will provide a good degree of scalability. However, there are a number of difficult challenges that remain unsolved, because of two main realities: 1. Modern types of video are coming to consumers, which will make video processing much more complex than its current state. Examples include Ultra High Definition (UHD) video, interactive video, multi-view video, tiled video, 3D video, and multi-segmented video. 2. The majority of consumers will access these videos not with personal computers or laptops connected to wired networks, but with their mobiles devices such as smartphones or tablets. It is forecasted that, by 2019, consumers will access the Internet by 11.5 billion such mobile devices. This creates hard challenges because of battery or computing limitations on these devices, and their wireless access mode which has much wilder fluctuations than wired networks. The aim of this project is to design algorithms and systems for modern video applications and to improve the quality, efficiency, and scalability of multimedia distribution systems to meet the fast-growing demands. We will investigate the delivery of new types of video content and improve the quality of UHD, interactive, multi-view, tiled, and 3D videos. Current adaptive streaming systems suffer from unfairness, instability, and under-utilization and these problems will be exacerbated by richer multi-component video content and growth in the number of mobile-connected devices. To meet the rapidly increasing demand, current 4G and future 5G wireless networks and the new HTTP 2.0 standard introduce complex architectural and algorithmic features. We will analyze the impact of these features on adaptive multimedia streaming systems and develop algorithms to maximize the utilization of cellular networks, improve the quality of multi-component videos delivered over such networks, and minimize the energy consumption of mobile devices participating in video sessions. Furthermore, we will develop novel cloud-based algorithms for processing, adapting, and storing large datasets of rich multi-component videos.

  • Intelligent Serious Games for Social and Cognitive Competence

    Nov. 1, 2015 - Nov. 1, 2018

    Funding Institution: EU & Turkish Ministry of EU

    Principal Investigator: Shervin Shirmohammadi

    Project consortium which is coordinated by Prof. Shervin Shirmohammadi from Computer Science and Engineering Department of ŞEHİR, is composed of University of Pannonia-Hungary, University of Maribor-Slovenia, ZGURA-M Ltd.-Bulgaria PhoenixKM-Belgium, Ubited-Turkey, 4flyy Ltd.-Turkey and Onda Dokuz Education Services-Turkey. 36 months-project will help to improve the quality, attractiveness and accessibility of the opportunities for lifelong learning available by developing interactive mobile games and 3D stimulations by user generated scenarios for acquiring transversal competencies such as social competence and creativeness to increase the social integration and personal development of children and youth with disabilities.

Project Summaries - Completed Projects

  • Clonal Heterogeneity and Development Through Detailed Longitudinal Sampling of Glioma Stem Cells

    July 1, 2016 - July 1, 2017

    Funding Institution: Tubitak & EU Marie Curie

    Principal Investigator: Mehmet Baysan

    The main objective of this research project is to gain insight into the clonal selection and clonal developmental patterns of GBM (most common and aggressive brain cancer) development, which is critical in order to develop improved therapies. We will study a recently generated data set, unique in terms of detailed sampling of homogeneous and heterogeneous tumorigenic populations from spatially distant locations of brain tissue to address this aim. To achieve maximum utilization of genomic data, we will also integrate this data with previously generated data sets using glioma stem cell (GSC) lines and large-scale patient tumor data sets such as The Cancer Genome Atlas (TCGA).

  • Sharing of Language Resources through Multilingual Representations

    April 1, 2015 - Oct. 1, 2016

    Funding Institution: TUBITAK

    Principal Investigator: Onur Güzey

    In this project multilingual representations will be used to enable Turkish natural language processing tools to utilize foreign language resources. The amount and quality of the data used is the determining factor in the success of natural language applications such as question answering and machine translation. Therefore, increasing the amount of data available in a language is crucial. Multilingual representations are formed by associating representations generated in one language to representations in another language. Through this association, use of rich resources available in a language such as English in less resource rich languages becomes possible. In this project, multilingual representations will be adapted for Turkish. As a result, multilingual representations that have been proven useful for Turkish, and supporting software will be shared with other researchers.

  • Algorithms and Tools for Computational Modeling of Metabolomics Data

    Jan. 1, 2015 - March 31, 2017

    Funding Institution: TUBITAK

    Principal Investigator: Ali Çakmak

    Metabolic networks consist of interacting biochemical reaction chains (i.e., pathways) that are responsible for essential cellular processes such as energy production, DNA synthesis, and lipid breakdown. Reactions in these networks are connected to each other through chemical substances (e.g., glucose) that they consume and produce. Such substances are called “metabolites”, and with the recent advancements in biotechnology, it is possible to simultaneously measure the concentrations for thousands of metabolites in biofluids (e.g., blood, urine, etc.). Extraordinary changes in metabolite concentrations often point to physiological conditions. Metabolomics is the study of these concentration changes as well as interpretation of what their biochemical implications may be. Interpreting the concentration changes of large numbers of metabolites with respect to a metabolic network is complicated, time consuming, and cumbersome. In order to decrease and manage the complexity, most researchers usually focus on only a small select set of metabolites among thousands. This usually results in omitting a large chunk of the big picture, and makes the analysis conclusions local and limited. The goal of this project is to develop models, algorithms, and tools to automatically interpret large numbers of metabolite concentration changes in a holistic manner, and produce a set of biologically viable hypotheses explaining the observed changes. Besides, we aim to create health applications that will enable the use of the output from the above models and algorithms.

  • Adaptive Small Delay Defect Diagnosis

    Jan. 1, 2015 - Dec. 31, 2016

    Funding Institution: TUBITAK

    Principal Investigator: Barış Arslan

    Rapidly increasing the product yield for new process generations is crucial in achieving aggressive time-to-market requirements in semiconductor industry. Furthermore, a higher yield is primary driver in reducing product cost and increasing profit margins. As semiconductor manufacturing process scaling continues, subtle defects, primarily small delay defects that alter the timing of the circuits, are becoming increasingly common. A fast and accurate localization of small delay defects is essential in the identification of the cause of the defects and in the subsequent yield improvement. However, small and variable magnitude of small delay defects when coupled with the variable timing margins of the chips due to the process variation imposes inordinate challenges in small delay defect diagnosis. This project develops highly accurate small delay defect diagnosis techniques. Optimal test failure data collection conditions are adaptively determined per chip and on-chip sensor data is intertwined with statistical pre-silicon analysis techniques to precisely pinpoint the failure locations.

  • Search Campaign Management using Topic Models

    March 1, 2014 - Sept. 1, 2015

    Funding Institution: TUBITAK

    Principal Investigator: Ahmet Bulut

    We propose to build a semantic overlay on top of the search campaigns in order to avoid the mental mismatch that occurs due to repetitive campaign audits. Our proposal is to build a latent topic model using advertisement keywords and search terms. An online service embodies a multitude of solution aspects, each of which serves a specific customer need or information need. Each solution aspect is conceptually a theme or a topic. Our hypothesis is as follows: a latent topic model governs a hidden data-generative process, which is responsible for emitting the observed search terms. And our task is to reverse engineer the hidden parameters of the topic model from the observations. Each information need corresponds to what a user has in mind while querying the web. For example, if a user wants to create a custom social network for his own personal use, then she will pose the query “how to create a private social network” to the search engine. The reasoning behind our model-based approach assumes that behind this query there are governing themes, some of which are (i) social network creation platforms, (ii) personalization in social networks, and (iii) privacy in social networks. Once these themes are determined, the campaign management task is going to become a management of themes instead of management of raw sets of keywords. For example, if the advertiser wants to enhance a certain theme among others, then she can increase the budget allocated for those keywords that are governed by this specific theme. Since the model itself is generative, new keywords can be created around a select set of themes.

  • Data Mining Techniques for Fast Query Optimization in Relational Database Systems

    Jan. 1, 2014 - Dec. 31, 2015

    Funding Institution: TUBITAK

    Principal Investigator: Ali Çakmak

    Query optimization module is responsible for generating the lowest cost execution plans for queries running on database systems. The number of alternative execution plans that should be considered during optimization increases exponentially with the number of tables involved in a query. On the other hand, since the optimization is done during compilation-time, optimization time is rather limited. Therefore, eliminating the plan alternatives, which will be discarded later due to high estimated cost, before evaluation can provide significant performance gains. The purpose of this project is to carry out the query optimization faster and more efficiently by using data mining techniques. This project proposes to (i) apply query transformations selectively by graph mining during logical optimization, and (ii) eliminate beforehand the high-cost join order alternatives by using sequence mining techniques during physical optimization.

  • Intelligent Scoping of Paid Search Campaigns using Advertiser & End User Provided Relevance-Feedback

    Oct. 1, 2013 - Oct. 1, 2015

    Funding Institution: TUBITAK

    Principal Investigator: Ahmet Bulut

    We are planning to build an automated search campaign management tool for paid search advertising. Our goal is to increase sales while easing the campaign management for advertisers. Our approach is to unify the positive end-user feedback in the form of conversions with the negative feedback provided by advertisers for building a conversion model.

  • Topic Strand: Analyzing Social Media Discussions w.r.t. Participation Patterns and Participant Profiles

    Sept. 1, 2013 - Sept. 1, 2015

    Funding Institution: COST IC1205 - Computational Social Choice project

    Principal Investigator: Tarık Arıcı

    Social media discussions exhibit rich temporal dynamics with respect to their occurrence as well as the profiles of the participants engaging in the discussions. The goal of our project is to detect and extract topic discussions from social media content, determine the features that summarize the discussion dynamics and select the ones that are most distinctive, form a multi-dimensional time series representation using these distinctive features (aka topic strands), design clustering algorithms for grouping similar topic strands, and integrate multi-resolution analysis techniques in order to capture similarities across time scales. Dr. Buğra Gedik from the Computer Engineering Department at the İhsan Doğramacı Bilkent University and Dr. Mehmet Fatih Aysan from the Sociology Department at the İstanbul Şehir University will work as researchers iin this interdisciplinary project.

  • Testing NVR Suite of Programs with Real NMR Data

    Sept. 1, 2013 - March 1, 2015

    Funding Institution: TUBITAK - CNRS

    Principal Investigator: Mehmet Apaydın

    Assist. Prof. M. Serkan Apaydın and Dr. Eric Guittet’s (Institut de Chimie des Substances Naturelles, France) joint proposal has been awarded with TÜBİTAK -National Scientific Research Center of France (CNRS) Bilateral Cooperation Program Travel Support. The Project, in which Assoc. Prof. Bülent Çatay from Sabancı University also takes part as researcher, aims to test the Nuclear Magnetic Resonance Spectroscopy (NMR) protein structure-based assignment sofware developed in Assist. Prof. Apaydın’s lab with data collected in Dr. Guittet’s laboratory. TÜBİTAK -CNRS program will provide the travel support for the joint project.

  • TTelligent Labs: T2C2

    Sept. 1, 2012 - Dec. 31, 2013

    Funding Institution: Türk Telekom - TUBITAK

    Principal Investigator: Ahmet Bulut

    TTelligent Labs aims to provide a scalable data analysis framework for a wide range of enterprise customers. The framework is provided as a service and runs purely in the TT compute cloud (T2C2). The T2C2 will be built and used for the project and also become a project deliverable. The customers use T2C2 to run complex data analysis jobs that are otherwise “difficult” to put together, time intensive to compute, and hard to scale using traditional siloed approaches (e.g., RDBMS, OLAP, Data Cubes). Furthermore, there is a vast amount of “non-standard” real-time and micro data (e.g., Twitter tweets, Facebook likes, bookmarks, Linkedin group memberships, Pinterest pins, Google+’s and etc.) that you cannot possibly integrate all in one place. Customers need a parallel and immensely scalable set of adhoc analysis tools to crunch data to drive real time business insights. TTelligent Labs facilitates this insight extraction process. The main driver is to transform the business data analytics space by making business and data analysts get used to the following frame of thinking: What insight would I gain if I had full use of a 100-node compute cluster (C2) for an hour? What if one hour of this 100-node C2 would cost me only 50 TL? In the traditional approach, a business user first determines what question to ask, and then; IT would massage the underlying data to answer that query. In the new approach (T2C2), IT provides a platform that enables the creative discovery process. Then, the business user starts exploring with questions. This is the future of computing and doing business going forward.

  • Intelligent Regularization: Sensitivity-Steered Message Passing

    Sept. 1, 2012 - Dec. 1, 2014

    Funding Institution: TUBITAK

    Principal Investigator: Tarık Arıcı

    In this project, our goal is to achieve a milestone in solving inverse problems. Since inverse problems are basically underdetermined problems, some assumptions and prior information are utilized to improve the ill-definedness of the problem. We are going to infer from the data and the iterate if the constraints that encapsulate these assumptions and prior information are correct or not by using Bregman’s algorithm.

  • Real-time Video Analytics Engine Optimized for GPUs

    Jan. 1, 2012 - Dec. 31, 2015

    Funding Institution: EU 7th FP Marie Curie Career Integration Grants (CIG)

    Principal Investigator: Tarık Arıcı

    Video surveillance systems are widely deployed to keep private and public spaces safe and secure. There are over 30 million cameras in United States only, shooting 4 billion hours of video footage a week. Currently, it requires significant human supervision to analyze the videos captured by surveillance cameras. Since it is not possible to analyze all the video data with eye inspection, most of it is stored and not processed. Programmable Graphics Processor Units (GPUs) have evolved into multi-threaded, many-core, highly parallel processors. However, to be able to take full advantage of the GPUs, the algorithms must be highly parallel. The objective of the project is to design and implement parallel video analysis algorithms optimized for the GPU.

  • Energy-Aware Mobile Gaming and Virtual Environments

    Jan. 1, 2012 - Sept. 1, 2012

    Funding Institution: TUBITAK

    Principal Investigator: Shervin Shirmohammadi

    Due to the limited battery life of mobile devices, gameplay on such devices is significantly shorter and with lower quality compared to their PC counterparts. This limits mobile games, which in turn reduces opportunities in this lucrative market. In this project, we study, design, and develop energy-aware mobile gaming technologies that will help reduce the amount of power consumption by games on mobile devices, leading to longer playing time and higher gaming quality for players. We do so by designing novel algorithms and techniques for streaming and rendering of 3D graphical game objects to mobile devices, including limiting lighting effects and improving brightness control via textural transformation. In addition to 3D graphical objects, we also study video-based game streaming, as done in cloud gaming platforms such as OnLive, and design new methods to adapt the video bitrate to the capabilities of the player’s device without perceivable loss of quality of gaming experience.

  • Automated NMR Structure Based Assignments

    Jan. 10, 2010 - Sept. 11, 2015

    Funding Institution: EU

    Principal Investigator: Mehmet Apaydın

    While automation is revolutionizing many aspects of biology, the determination of three-dimensional protein structure remains a long, hard, and expensive task. Novel algorithms and computational methods in biomolecular NMR are necessary to apply modern techniques such as structure-based drug design on a much larger scale. The goal of this project is to address a key computational bottleneck in NMR structural biology, resonance assignments. We will accelerate protein NMR assignment by exploiting a priori structural information. By analogy, in X-ray crystallography, the molecular replacement (MR) technique allows solution of the crystallographic phase problem when a “close” or homologous structural model is known, thereby facilitating rapid structure determination. In contrast, a key bottleneck in NMR structural biology is the assignment problem. An automated procedure for rapidly determining NMR assignments given an homologous structure, will similarly accelerate structure determination. Moreover, even when the structure has already been determined by crystallography or computational homology modeling, NMR assignments are valuable because NMR can be used to probe protein-protein interactions and protein-ligand binding (e.g. via chemical shift mapping), and dynamics (via, e.g., nuclear spin relaxation). We will develop an MR-like approach for structure-based assignment of resonances and NOEs, to be applied when a homologous protein is known. The tool that we develop will accept both CH- and NH- RDCs, and 4-D NOESY data, and will implement a Bayesian scoring function for structure-based assignments. It will provide the user the option to use only NH RDCs or NH and CH RDCs and will be tested on real proteins. The source code will be released as open source with the user manual.

  • Novel algorithms for NMR Structure-Based Assignments

    Sept. 1, 2009 - Sept. 1, 2011

    Funding Institution: TUBITAK

    Principal Investigator: Mehmet Apaydın

    NMR spectroscopy is one of the techniques used in protein structure determination and functional studies. This technique is also used in determining the dynamics of proteins, in drug development, in investigating protein-protein interactions and in determining structure-function relationship. Protein Structure Initiative (PSI) initiated by the Institute of Health for764 million U.S. dollars in 2000, aims to define a sample structure of each protein strand by using the techniques of NMR and X-ray. As a result, structures of homologous proteins solved by PSI can be used to determine the structures of thousands of new protein sequences added to databases every month. It is intended to use the homologous protein information for the assignment problem, which is an important and very time-consuming step in protein structure determination by NMR, in order to produce a solution to the problem more quickly and accurately. For this purpose, exact and heuristic approaches have been developed for the protein structure-based assignment problem, and these approaches have been tested succesfully with a large data set including large (> 200 amino acids) proteins. This program is expected to speed up the work of determining structures by NMR within PSI project, and be an approach like the molecular changes in X-ray spectroscopy.

İstanbul Şehir University
Altunizade Mah. Oymacı Sok. No: 15
34660 Istanbul, Turkey
Phone: +90 216 559 9000