Database Systems for Advanced Applications: 20th by Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, Muhammad Aamir

Matthias Renz, Cyrus Shahabi, Xiaofang Zhou, Muhammad Aamir Cheema

This quantity set LNCS 9049 and LNCS 9050 constitutes the refereed court cases of the 20 th foreign convention on Database structures for complicated functions, DASFAA 2015, held in Hanoi, Vietnam, in April 2015.

The sixty three complete papers awarded have been rigorously reviewed and chosen from a complete of 287 submissions. The papers conceal the next subject matters: facts mining; information streams and time sequence; database garage and index; spatio-temporal info; glossy computing platform; social networks; details integration and information caliber; info retrieval and summarization; safety and privateness; outlier and imbalanced information research; probabilistic and unsure facts; question processing.

The type of random projections discussed here are not a general purpose technique: the Johnson-Lindenstrauss lemma only gives the existence of a random projection that preserves the distances, but we may need to choose different projections for different distance functions. The projections discussed here were for unweighted Lp -norm distances. Furthermore it should be noted, as pointed out by Kab´ an [20], that random projection methods are not suitable to defy the “concentration of distances”-aspect of the “curse of dimensionality” [43]: since, according to the Johnson-Lindenstrauss lemma, distances are preserved approximately, these projections will also preserve the distance concentration.

If we could make use of supervised information in the dimensionality reduction algorithm to maximize the separation between minority class and majority class, it is expected that a better results would be achieved. Synthetic Minority Oversampling Method 17 Secondly, although synthetic oversampling methods have achieved satisfactory results for imbalanced learning, a lot of other methods do exist. Recently, there are some model-based oversampling methods such as SPO [20][21] and MoGT [22]. SPO [20][21] assumes that the minority samples follow a multivariate Gaussian distribution.

