The techniques, implemented in MeSHSim package, to measure similarity between two MeSH headings can generally be divided into two categories: path-based methods which utilize the shortest path between two headings in ontology hierarchy, information-based methods which examine how much information the two headings share.

The MeSHSim is implemented as a package for statistical computing environment R. It needs three R packages, bitops, XML and RCurl. The bitops package provides supports for bitwise operations on integer vectors, which is used by both XML and RCurl. The XML package can be utilized to read XML documents. The RCurl package (Lang, 2007) is to fetch documents information from PubMed. The RCurl package is an R-interface to the libcurl library providing HTTP facilities, for instance, downloading files from Web servers. All these packages are freely available at the CRAN (Comprehensive R Archive Network) package repository.

The “method” parameter can be set to SP, WL, WP, LC, Li, Lord. Resnik, Lin, JC, which represent the similarity measures Short Path(Bulskov et al., 2005), Weighted Links(Richardson et al., 1994), Wu and Palmer(Wu and Palmer, 1994), Leacock and Chodorow(Leacock and Chodorow, 1994), Li(Li et al., 2003), Lord(Lord et al., 2003), Resnik(Resnik, 1999), Lin(Lin, 1993), Jiang and Conrath(Jiang and Conrath, 1998), respectively.

Here are some detailed examples for function “nodeSim”


      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="SP")
      [1] 0.8
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="WL")
      [1] 0.809609
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="WP")
      [1] 0.6666667
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="LC")
      [1] 0.5
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="Li")
      [1] 0.4419936
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="Lord")
      [1] 0.9999999
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="Resnik")
      [1] 0.3542506
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="Lin")
      [1] 0.5613063
      ➢	nodeSim("C01.252.400.210.250.299", "C01.252.400.310",method="JC")
      [1] 3.050786e-10
          

Since there are two framework, heading-based and node-based, to calculate the similarity of two MeSH heading, the “frame” parameter can be set to “head” and “node” in function “headingSim” and “mheadingSim” . Here are some detailed examples to show how to use headingSim.

Here are some detailed examples for function “headingSim”


      ➢	headingSim("Lumbosacral Region", "Body Regions",method="WL",frame="node")
      [1] 0.7535095
      ➢	headingSim("Lumbosacral Region", "Body Regions",method="WL",frame="heading")
      [1] 0.7535095
          

Since the usage of functions mnodeSim and mheading are similar to nodeSim and headingSim, we just put some example here to show how to use them.


      ➢	a<-c("B03.440.400.425.325.150", "B03.440.400.425.117.800.200", "B03.440.400.425.188.102")
      ➢	b<-c("D23.050.301.290.538", "B04.820.880.800", "B01.650.940.800.575")
      ➢	mnodeSim(a, b,method="SP")
                [,1]       [,2]       [,3]
      [1,] 0.0000000  0.0000000  0.0000000
      [2,] 0.5238095  0.4761905  0.5238095
      [3,] 0.4761905  0.4285714  0.4761905
      ➢	mnodeSim(a, a,method="SP")
                [,1]       [,2]       [,3]
      [1,] 1.0000000  0.7619048  0.8095238
      [2,] 0.7619048  1.0000000  0.7619048
      [3,] 0.8095238  0.7619048  1.0000000
      

Here is a example for function “mheadingSim”


      ➢	a<-c("Body Regions", "Abdomen", "Abdominal Cavity")
      ➢	mheadingSim(a,a,method="WP",frame="node")
                [,1]       [,2]       [,3]
      [1,] 1.0000000  0.6666667  0.5714286
      [2,] 0.6666667  1.0000000  0.8888889
      [3,] 0.5714286  0.8888889  1.0000000
          

The function docSim is to calculate the similarity between two Medline documents, it uses RCurl package to fetch the MeSH terms of given Medline documents.

Here is a example for function "docSim"


      ➢	docSim("2189633", "18974831", meth-od="SP",frame="node")
      [1] 0.1
          

The function docInfo is to retrieve title, abstract and MeSH heading of a specified article identified by PMID, and “verbose” parameter is a bool variable to set whether to fetch abstract and title of the article. And “major” parameter is also a bool variable to control whether to only fetch major heading or not.


      ➢	docInfo("111111",verbose=TRUE,major=FALSE)
      [1] "Title: [Diagnosis of acute gastrointestinal hemorrhages]."
      [1] "Abstract: NA"
      [1] MeSH headings:
      [1] "Angiography" "Digestive System"
      [3] "Emergencies" "Endoscopy"
      [5] "Gastrointestinal Hemorrhage" "Humans"
          

Due to the space limitation, we do not show the functions nodeInfo in our paper. Here we put how to use them, and what information they return. The nodeInfo will return the parent nodes and children nodes of specified node and MeSH term of related nodes.


      ➢	nodeInfo("B01.650.940.800.575")
      $B01
      $B01$B01.043
      [1] "B01.043"

      $B01$B01.046
      [1] "B01.046"

      $B01$B01.050
      [1] "B01.050"

      $B01$B01.175
      [1] "B01.175"

      $B01$B01.206
      [1] "B01.206"

      $B01$B01.237
      [1] "B01.237"

      $B01$B01.268
      [1] "B01.268"

      $B01$B01.300
      [1] "B01.300"

      $B01$B01.400
      [1] "B01.400"

      $B01$B01.500
      [1] "B01.500"

      $B01$B01.625
      [1] "B01.625"

      $B01$B01.630
      [1] "B01.630"

      $B01$B01.650
      $B01$B01.650$B01.650.085
      [1] "B01.650.085"

      $B01$B01.650$B01.650.232
      [1] "B01.650.232"

      $B01$B01.650$B01.650.449
      [1] "B01.650.449"

      $B01$B01.650$B01.650.510
      [1] "B01.650.510"

      $B01$B01.650$B01.650.520
      [1] "B01.650.520"

      $B01$B01.650$B01.650.560
      [1] "B01.650.560"

      $B01$B01.650$B01.650.660
      [1] "B01.650.660"

      $B01$B01.650$B01.650.700
      [1] "B01.650.700"

      $B01$B01.650$B01.650.723
      [1] "B01.650.723"

      $B01$B01.650$B01.650.915
      [1] "B01.650.915"

      $B01$B01.650$B01.650.940
      $B01$B01.650$B01.650.940$B01.650.940.150
      [1] "B01.650.940.150"

      $B01$B01.650$B01.650.940$B01.650.940.800
      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.150
      [1] "B01.650.940.800.150"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575
      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.100
      [1] "B01.650.940.800.575.100"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.118
      [1] "B01.650.940.800.575.118"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.137
      [1] "B01.650.940.800.575.137"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.175
      [1] "B01.650.940.800.575.175"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.400
      [1] "B01.650.940.800.575.400"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.462
      [1] "B01.650.940.800.575.462"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.525
      [1] "B01.650.940.800.575.525"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$B01.650.940.800.575.825
      [1] "B01.650.940.800.575.825"

      $B01$B01.650$B01.650.940$B01.650.940.800$B01.650.940.800.575$term
      [1] "Embryophyta"


      $B01$B01.650$B01.650.940$B01.650.940.800$term
      [1] "Streptophyta"


      $B01$B01.650$B01.650.940$B01.650.940.900
      [1] "B01.650.940.900"

      $B01$B01.650$B01.650.940$term
      [1] "Viridiplantae"


      $B01$B01.650$term
      [1] "Plants"


      $B01$B01.675
      [1] "B01.675"

      $B01$B01.680
      [1] "B01.680"

      $B01$B01.750
      [1] "B01.750"

      $B01$term
      [1] "Eukaryota"
          

Due to the space limitation, we do not show the functions termInfo in our paper. Here we put how to use them, and what information they return. The termInfo will return the parent nodes and children nodes of specified node and MeSH term of related nodes.


      ➢	termInfo("Eukaryota")
      [[1]]
      [[1]]$B01
      [[1]]$B01$B01.043
      [1] "B01.043"

      [[1]]$B01$B01.046
      [1] "B01.046"

      [[1]]$B01$B01.050
      [1] "B01.050"

      [[1]]$B01$B01.175
      [1] "B01.175"

      [[1]]$B01$B01.206
      [1] "B01.206"

      [[1]]$B01$B01.237
      [1] "B01.237"

      [[1]]$B01$B01.268
      [1] "B01.268"

      [[1]]$B01$B01.300
      [1] "B01.300"

      [[1]]$B01$B01.400
      [1] "B01.400"

      [[1]]$B01$B01.500
      [1] "B01.500"

      [[1]]$B01$B01.625
      [1] "B01.625"

      [[1]]$B01$B01.630
      [1] "B01.630"

      [[1]]$B01$B01.650
      [1] "B01.650"

      [[1]]$B01$B01.675
      [1] "B01.675"

      [[1]]$B01$B01.680
      [1] "B01.680"

      [[1]]$B01$B01.750
      [1] "B01.750"

      [[1]]$B01$term
      [1] "Eukaryota"

      [[1]]$B01$B01
      [[1]]$B01$B01$B01.043
      [1] "B01.043"

      [[1]]$B01$B01$B01.046
      [1] "B01.046"

      [[1]]$B01$B01$B01.050
      [1] "B01.050"

      [[1]]$B01$B01$B01.175
      [1] "B01.175"

      [[1]]$B01$B01$B01.206
      [1] "B01.206"

      [[1]]$B01$B01$B01.237
      [1] "B01.237"

      [[1]]$B01$B01$B01.268
      [1] "B01.268"

      [[1]]$B01$B01$B01.300
      [1] "B01.300"

      [[1]]$B01$B01$B01.400
      [1] "B01.400"

      [[1]]$B01$B01$B01.500
      [1] "B01.500"

      [[1]]$B01$B01$B01.625
      [1] "B01.625"

      [[1]]$B01$B01$B01.630
      [1] "B01.630"

      [[1]]$B01$B01$B01.650
      [1] "B01.650"

      [[1]]$B01$B01$B01.675
      [1] "B01.675"

      [[1]]$B01$B01$B01.680
      [1] "B01.680"

      [[1]]$B01$B01$B01.750
      [1] "B01.750"

      [[1]]$B01$B01$term
      [1] "Eukaryota"

      [[1]]$B01$B01$B01
      [[1]]$B01$B01$B01$B01.043
      [1] "B01.043"

      [[1]]$B01$B01$B01$B01.046
      [1] "B01.046"

      [[1]]$B01$B01$B01$B01.050
      [1] "B01.050"

      [[1]]$B01$B01$B01$B01.175
      [1] "B01.175"

      [[1]]$B01$B01$B01$B01.206
      [1] "B01.206"

      [[1]]$B01$B01$B01$B01.237
      [1] "B01.237"

      [[1]]$B01$B01$B01$B01.268
      [1] "B01.268"

      [[1]]$B01$B01$B01$B01.300
      [1] "B01.300"

      [[1]]$B01$B01$B01$B01.400
      [1] "B01.400"

      [[1]]$B01$B01$B01$B01.500
      [1] "B01.500"

      [[1]]$B01$B01$B01$B01.625
      [1] "B01.625"

      [[1]]$B01$B01$B01$B01.630
      [1] "B01.630"

      [[1]]$B01$B01$B01$B01.650
      [1] "B01.650"

      [[1]]$B01$B01$B01$B01.675
      [1] "B01.675"

      [[1]]$B01$B01$B01$B01.680
      [1] "B01.680"

      [[1]]$B01$B01$B01$B01.750
      [1] "B01.750"

      [[1]]$B01$B01$B01$term
      [1] "Eukaryota"
          
  • Bulskov, H., Knappe, R., and Andreasen, T. (2002). On measuring similarity for conceptual querying. Proceedings of the 5th International Conference on Flexible Query Answering Systems (FQAS02), 2522, 100–111.
  • Cohen, P. and Kjeldsen, R. (1987). Information retrieval by constrained spreading activation in semantic networks. Information Processing and Management, 23(4), 255–268.
  • Jiang, J. and Conrath, D. (1998). Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference on Research in Computational Linguistic, Taiwan.
  • Leacock, C. and Chodorow, M. (1994). Filling in a sparse training space forword sense identification. In Proceedings of the 32nd Annual Meeting of the Associa-tions for Computational Linguistics(ACL94).

  • Li, Y., Bandar, Z. A., and McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources.IEEETransactions on Knowledge and Data Engineering, 15(4), 871–882.

  • Lin, D. (1993). Principle-based parsing without overgeneration. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL93), pages 112–120.

  • Lord, P., Stevens, R., Brass, A., and Goble, C. (2003). Investigating semantic similarity measures across the gene ontology: the relationship between sequence and an-notation. Bioinformatics, 19(10), 1275–1283.
  • Nelson, S., Schopen, M., AG, S., S.JL, and A.N (2004). The mesh translation maintenance system: Structure, interface design, and implementation. In Proceedings of MEDINFO.
  • Resnik, O. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity and natural language. Journal of Arti-ficial Intelligence Research, 19, 95–1130.

  • Richardson, R., Smeaton, A., and Murphy, J. (1994). Using wordnet as a knowledge base for measuring semantic similarity between words. School of Computer Ap-plications, Dublin City University.

  • Wang, J., Du, Z., Payattakool, R., Yu, P., and Chen, C. (2007). A new method to measure the semantic similarity of go terms.Bioinformatics, 23, 1274C1281.
  • Wu, Z. and Palmer, M. (1994). Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics (ACL’94), pages 133–138.

  • Zhu, S., Zeng, J., and Mamitsuka, H. (2009). Enhancing medline document clustering by incorporating mesh semantic similarity. Bioinformatics, 25(15), 1944–1951.