{"id":14172,"date":"2022-08-25T08:51:25","date_gmt":"2022-08-25T08:51:25","guid":{"rendered":"https:\/\/www.dimensions.ai\/?p=14172"},"modified":"2022-08-25T08:59:52","modified_gmt":"2022-08-25T08:59:52","slug":"how-semantic-search-improves-search-accuracy","status":"publish","type":"post","link":"https:\/\/www.dimensions.ai\/blog\/how-semantic-search-improves-search-accuracy\/","title":{"rendered":"How semantic search improves search accuracy"},"content":{"rendered":"\n<p>Traditionally search engines use a lexical search, this is where literal matches of words, phrases, or variants are used to find results. Thus, lexical search allows for an easy-to-understand control of your query and the expected matches.&nbsp;<\/p>\n\n\n\n<p>The drawback in this approach is that the meaning behind the query can be lost, as it is only matching the query text characters.&nbsp; For example, you might miss all specific synonyms or subtypes of the semantic meaning of the query term that you did not specify explicitly. A further disadvantage is that you cannot resolve ambiguous terms.<\/p>\n\n\n\n<p>Advanced lexical search may also deliver back all documents in which a term is mentioned in two ways: as the exact text string or the text string plus variants thereof.&nbsp; For example \u2018polymers,\u2019 &#8220;polymers&#8221; will be stemmed to &#8220;polym&#8221;, also its variant &#8220;polymer&#8221; will be stemmed to &#8220;polym\u201d.<\/p>\n\n\n\n<p>As an example, if you were to query: \u201dpolymers\u201d in IFI patents as a lexical search with variants you\u2019d receive 12,407,911 documents. How do you then begin to understand what is really relevant? For example, if your search covered all parts of a document, including those non-relevant sections like the reference section of scientific articles, the number of potentially non-relevant hit documents is increasing.<\/p>\n\n\n\n<p>This is where Semantic Search comes into its own.<\/p>\n\n\n\n<p><strong>What is semantic search?<\/strong><\/p>\n\n\n\n<p>Semantic search tries to understand the semantic meaning of the query words or phrases, resulting in better accuracy and relevance of search results.<\/p>\n\n\n\n<p>In <a href=\"https:\/\/www.dimensions.ai\/products\/life-sciences-chemistry\/\">Dimensions Life Sciences and Chemistry<\/a> (L&amp;C) we use <a href=\"https:\/\/ontochem.com\/\">OntoChem<\/a>\u2019s ontologies and NLP rules stored together in dictionary cartridges to enable semantic searches. They provide the required domain knowledge and contextual rules to deliver the semantic background and ensure the accuracy of the annotation.<\/p>\n\n\n\n<p>Semantic search is more powerful than classical lexical search. And, due to its extended domain knowledge, it usually returns more results that are also of higher relevance.<\/p>\n\n\n\n<p>One particular advantage of semantic search is the resolution of ambiguous terminology and that all specific subtypes (\u201cchildren\u201d) of a technical term will be found without the need to mention them in the query explicitly.&nbsp;<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"372\" src=\"https:\/\/www.dimensions.ai\/wp-content\/uploads\/2022\/08\/polymer-1024x372.png\" alt=\"\" class=\"wp-image-14181\" srcset=\"https:\/\/www.dimensions.ai\/wp-content\/uploads\/2022\/08\/polymer-1024x372.png 1024w, https:\/\/www.dimensions.ai\/wp-content\/uploads\/2022\/08\/polymer-300x109.png 300w, https:\/\/www.dimensions.ai\/wp-content\/uploads\/2022\/08\/polymer-768x279.png 768w, https:\/\/www.dimensions.ai\/wp-content\/uploads\/2022\/08\/polymer-825x300.png 825w, https:\/\/www.dimensions.ai\/wp-content\/uploads\/2022\/08\/polymer.png 1084w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In Dimensions L&amp;C you can search \u2018Ontologically with synonyms\u2019 to find the search term, as well as all synonyms of this concept and all synonyms of ontological subclasses.&nbsp; Or, on the other hand, search \u2018Concept only, with synonyms\u2019 to find the search term, as well as all synonyms of this concept, but no ontological subclasses.&nbsp;<\/p>\n\n\n\n<p>For greater relevance, the semantic search is executed only on relevant document parts, e.g. the reference section of scientific articles is left out. This way, the portion of highly relevant hit documents is increased, and the number of less relevant hits is reduced.<\/p>\n\n\n\n<p><strong>Examples in action: How semantic search can improve accuracy<\/strong><\/p>\n\n\n\n<p>Problem: Ambiguous terminology<\/p>\n\n\n\n<p>Running the query: \u201ccancer\u201d<\/p>\n\n\n\n<p>A lexical search will deliver back all documents in which cancer is mentioned as the disease \u201ccancer\u201d but also all documents in which the species \u201ccancer\u201d, e.g. \u201cCancer borealis\u201d or \u201cCancer irroratus\u201d is meant.<\/p>\n\n\n\n<p>In a semantic search, the user chooses the search space whether it is to be searched as a disease or a species. Domain-specific context rules regulate whether the disease or the species \u201ccancer\u201d is annotated in the text. So the documents that are returned as a hit are usually much more relevant.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/yMf3n6OASH3A6Iit7Ol7kXEzdG_Wf-N5OIOppGqp5oYH08t0lWd05O0ibl1PpgJilGSOm3zb_DqY9oU4c7kCBA4Sse84tfzgWwjz7pRirShmSXLo0aet6pyErS6EvRBMdZ45Zh3F5gY_wsffFWIMbnA\" width=\"587\" height=\"330.63574513626713\"><\/p>\n\n\n\n<p>Figure 4<\/p>\n\n\n\n<p>Running the query: \u201csting\u201d<\/p>\n\n\n\n<p>A lexical search will deliver back all documents in which \u201csting\u201d&nbsp; is mentioned as the injury (disease) \u201csting\u201d but also all documents in which the protein family STING is meant. Or the musician \u201cSting\u201d or the verb \u201csting\u201d.&nbsp;<\/p>\n\n\n\n<p>In a semantic search, the user chooses the search space whether it is to be searched as a disease or a gene. Domain-specific context rules regulate whether the disease or the protein \u201cSTING\u201d is annotated in the text. So only the relevant documents are returned as a hit. Using the Domain Explorer, you can select the search space by choosing the domain of interest (Figure 2).<\/p>\n\n\n\n<p>Problem: Abbreviations and acronyms<\/p>\n\n\n\n<p>query: \u201cpmma\u201d<\/p>\n\n\n\n<p>A lexical search will deliver all documents in which \u201cpmma\u201d is mentioned as the polymer poly(methyl methacrylate), but also all documents in which the gene \u201cpmma\u201d is meant.<\/p>\n\n\n\n<p>As a semantic search, the user chooses the search space whether it is to be searched as a polymer or a gene. Domain-specific context rules regulate whether the polymer or the gene \u201cpmma\u201d is annotated in the text. So only the relevant documents are returned as a hit. Using the Domain Explorer, you can select the search space by choosing the domain of interest (Figure 5).<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/lbWO4Kg_MgfirpmfMB0KCoV1GMcgbS4DEWrsEynjM6U9Evn7xTgGFMk48C-XtrBAQGgzH-vaB5fv9z_5sOl0v2kJt8wsrQ4EkFwk_ngkJBnzV-SIaiIhnr5HYNxzt7TIBew3AtxdEzxSzuqY2mf__EA\" width=\"587\" height=\"331.42802119260597\"><\/p>\n\n\n\n<p>Figure 5<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Examples in action: How semantic search can improve recall<\/strong><\/p>\n\n\n\n<p>The number of relevant hits is increased as a semantic search is performed with an ontological concept that contains all synonyms as well as all ontological descendant concepts (child nodes and child nodes thereof).<\/p>\n\n\n\n<p>For example, running the query: \u201dpolymer\u201d<\/p>\n\n\n\n<p>A lexical search will deliver back all documents in which \u201cpolymer\u201d is mentioned as the text string \u201cpolymer\u201d plus variants thereof.<\/p>\n\n\n\n<p>A semantic search will, in addition to all documents containing the text string \u201cpolymer\u201d, also return documents that contain specific polymers like poly(methyl methacrylate), perloid, or nylon.<\/p>\n\n\n\n<p>For example, running the query: \u201dpesticides\u201d<\/p>\n\n\n\n<p>A lexical search will deliver back all documents in which \u201cpesticides\u201d is mentioned as the text string \u201cpesticides\u201d plus variants thereof.<\/p>\n\n\n\n<p>A semantic search will, in addition to all documents containing the text string \u201cpesticides\u201d, also return documents that contain specific pesticides like bixafen, boscalid, or imazamox.<\/p>\n\n\n\n<p>Dimensions Life Sciences and Chemistry includes both lexical and semantic search, the latter is programmed to interpret over 22 million concepts and over 55 million synonyms for more accurate results.<\/p>\n\n\n\n<p><strong>Get in touch if you\u2019d like a demo to see how it works.&nbsp;&nbsp;&nbsp;<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Traditionally search engines use a lexical search, this is where literal matches of words, phrases, or variants are used to find results. Thus, lexical search allows for an easy-to-understand control of your query and the expected matches.&nbsp; The drawback in this approach is that the meaning behind the query can be lost, as it is [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":14181,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"latestblog_background":"","latestblog_bgcolor":"","latestblog_textcolor":"","latestblog_overlay":false,"inline_featured_image":false,"footnotes":""},"categories":[8],"tags":[],"resource_audience_segment":[],"class_list":["post-14172","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":{"author_image":false,"author_name":"Claudia Bobach, Lauren Black, Felix Berthelmann"},"featured_image_urls":{"full":["https:\/\/www.dimensions.ai\/wp-content\/uploads\/2022\/08\/polymer.png",1084,394,false]},"post_excerpt_dimensions":"<p>Traditionally search engines use a lexical search, this is where literal matches of words, phrases, or variants are used to find results. Thus, lexical search allows for an easy-to-understand control of your query and the expected matches.&nbsp; The drawback in this approach is that the meaning behind the query can be lost, as it is&hellip;<\/p>\n","category_list":"<a href=\"https:\/\/www.dimensions.ai\/blog\/category\/blog\/\" rel=\"category tag\">Blog<\/a>","author_info":{"name":"Laura Broadberry","url":"https:\/\/www.dimensions.ai\/blog\/author\/laura-broadberry\/"},"comments_num":"0 comments","_links":{"self":[{"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/posts\/14172","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/comments?post=14172"}],"version-history":[{"count":0,"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/posts\/14172\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/media\/14181"}],"wp:attachment":[{"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/media?parent=14172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/categories?post=14172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/tags?post=14172"},{"taxonomy":"resource_audience_segment","embeddable":true,"href":"https:\/\/www.dimensions.ai\/wp-json\/wp\/v2\/resource_audience_segment?post=14172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}