Projects

We are working on many projects at the moment, including:

MarkIt

MarkIt is a program for mapping human gene names to Entrez Gene Identifiers. We believe that gene name identification is a modular process involving term recognition, classification and mapping [see our previous publication for details]. MarkIt focuses on gene name mapping, and uses an existing program (Abner) for gene name recognition and classification (entity recognition). We use a combination of two methods to map recognized entities to their appropriate gene identifiers (Entrez GeneIDs): the Trigram Method, and the Network Method. Both methods require preprocessing, using resources from Entrez Gene, to construct a set of method-specific matrices. We first address lexical variation by transforming gene names into their unique “trigrams” (groups of three alphanumeric characters), and perform trigram matching against the preprocessed gene dictionary. For ambiguous gene names, we additionally perform a contextual analysis of the abstract that contains the recognized entity. We have formalized our method as a sequence of matrix manipulations, allowing for a fast and coherent implementation of the algorithm (see BioCreativeII).

YIF

Yale Image Finder (YIF) is a publicly accessible search engine featuring a new way of retrieving biomedical images and associated papers based on the text carried inside the images. Image queries can also be issued against the image caption, as well as words in the associated paper abstract and title. A typical search scenario using YIF is as follows: A user provides few search keywords and the most relevant images are returned and presented in the form of thumbnails. Users can click on the image of interest to retrieve the high resolution image. In addition, the search engine will provide two types of related images: Those that appear in the same paper, and those from other papers with similar image content. Retrieved images link back to their source papers, allowing users to find related papers starting with an image of interest. Currently, YIF has indexed over 140,000 images from over 34,000 public-access biomedical journal papers.

The paper is available at http://bioinformatics.oxfordjournals.org/cgi/reprint/btn340

The search engine is available at http://krauthammerlab.med.yale.edu/imagefinder

caBIG

caBIG is the NCI’s cancer Biomedical Informatics Grid. Krauthammer Lab is involved in numerous ways with architecture, tissue banking, and pathology tools projects.