Find Conspirators by Message Traffic

Authors: Jiajun Lv, Xinlei Chen, Yu Xia

MCM/ICM Contest 2012, Meritorious Winner


To address the problem of conspirator detection using the provided message traffic information, we first gave a basic model based on the similarity measure between one individual and the already identified ones. Specifically, we construct a graph to encode the connection between two individuals in the social network.Then we measure the pairwise similarity by the commute distance between two nodes, which can be efficiently computed through a pseudo inverse of the graph Laplacian matrix. Finally, a logistic regression model is trained on the pairwise distance feature vectors, predicting one's probabilities of being part of the conspiracy. We then applied our model to the data of the current problem. First, we showed the correctness and the robustness of our model by training it with only a few identified individuals, to see if it can correctly predict the identities of the rest. After such model validation, we formally calculated one's probabilities of being a conspirator with all the identified ones used in training the logistic regression. Moreover, we studied the sensitiveness of outcomes to the parameters used in the initial graph construction. Further analysis shows that our model actually gives each individual an implicit representation that bears many properties the network, especially the local structures of the constructed graph. Based on this observation, we sought to enhance our model by incorporating this prior information into the state-of-the-art models from semantic network and text analysis, which typically learn an explicit representation for each individual. As a case study, we investigated two models, namely Probabilistic Latent Semantic Index (PLSI) and Nonnegative Matrix Factorization (NMF). We proposed a regularization framework to achieve our goal, and introduced an efficient algorithm to solve the optimization problem. Finally, we generalized our model to other applications and put forward a few notes concerning the scalability issue.



Paper: Download 533KB


Please send email to us if you have any questions.