Pathsim: Meta path-based top-k similarity search in heterogeneous information networks

Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu

    Research output: Chapter in Book/Report/Conference proceedingChapter

    • 340 Citations

    Abstract

    Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks. In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity, where a meta path is a path consisting of a sequence of relations defined between different object types (i.e., structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top-k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.

    LanguageEnglish (US)
    Title of host publicationProceedings of the VLDB Endowment
    Pages992-1003
    Number of pages12
    Volume4
    Edition11
    StatePublished - Aug 2011

    Fingerprint

    Heterogeneous networks
    Search engines
    Semantics
    Query processing
    Experiments

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)
    • Computer Science(all)

    Cite this

    Sun, Y., Han, J., Yan, X., Yu, P. S., & Wu, T. (2011). Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In Proceedings of the VLDB Endowment (11 ed., Vol. 4, pp. 992-1003)

    Pathsim : Meta path-based top-k similarity search in heterogeneous information networks. / Sun, Yizhou; Han, Jiawei; Yan, Xifeng; Yu, Philip S.; Wu, Tianyi.

    Proceedings of the VLDB Endowment. Vol. 4 11. ed. 2011. p. 992-1003.

    Research output: Chapter in Book/Report/Conference proceedingChapter

    Sun, Y, Han, J, Yan, X, Yu, PS & Wu, T 2011, Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. in Proceedings of the VLDB Endowment. 11 edn, vol. 4, pp. 992-1003.
    Sun Y, Han J, Yan X, Yu PS, Wu T. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In Proceedings of the VLDB Endowment. 11 ed. Vol. 4. 2011. p. 992-1003.
    Sun, Yizhou ; Han, Jiawei ; Yan, Xifeng ; Yu, Philip S. ; Wu, Tianyi. / Pathsim : Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment. Vol. 4 11. ed. 2011. pp. 992-1003
    @inbook{c0e02b13aa9743cf96497395474941f1,
    title = "Pathsim: Meta path-based top-k similarity search in heterogeneous information networks",
    abstract = "Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks. In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity, where a meta path is a path consisting of a sequence of relations defined between different object types (i.e., structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top-k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.",
    author = "Yizhou Sun and Jiawei Han and Xifeng Yan and Yu, {Philip S.} and Tianyi Wu",
    year = "2011",
    month = "8",
    language = "English (US)",
    volume = "4",
    pages = "992--1003",
    booktitle = "Proceedings of the VLDB Endowment",
    edition = "11",

    }

    TY - CHAP

    T1 - Pathsim

    T2 - Meta path-based top-k similarity search in heterogeneous information networks

    AU - Sun,Yizhou

    AU - Han,Jiawei

    AU - Yan,Xifeng

    AU - Yu,Philip S.

    AU - Wu,Tianyi

    PY - 2011/8

    Y1 - 2011/8

    N2 - Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks. In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity, where a meta path is a path consisting of a sequence of relations defined between different object types (i.e., structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top-k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.

    AB - Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks. In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity, where a meta path is a path consisting of a sequence of relations defined between different object types (i.e., structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top-k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.

    UR - http://www.scopus.com/inward/record.url?scp=83255185987&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=83255185987&partnerID=8YFLogxK

    M3 - Chapter

    VL - 4

    SP - 992

    EP - 1003

    BT - Proceedings of the VLDB Endowment

    ER -