Medical Image Retrieval Algorithm Based on Content

With the rapid development of science and technology and the improving medical service, medical image's role in clinical diagnosis and treatment becomes increasingly prominent. It has become a high-profile task to help the doctors pick out desired targets from massive medical images. Currently, techniques of text-based medical image retrieval have failed to meet the need of massive medical image retrieval. On the other hand, techniques of content-based medical image retrieval are already established and hold vast research potential. Starting with the introduction of matured techniques of medical image retrieval, the paper expounds on the evaluation criteria of the effectiveness of such techniques, then on the modified text-and-content-based medical image retrieval algorithm. The last part is the verification of the research conclusion with contrast experiments illustrated by the sample figures.


INTRODUCTION
With medical technology's rapid progress, massive medical images are being produced every day in medical institutions.The effective management of these images becomes a hot spot of research.
Depending on information such as the patients' name, the number of images or the serial number of their medical records, traditional techniques of text-based medical image retrieval are featured with evident retrieval trait and easy operation; they demand, however, the manual labeling on the images.For massive medical images, it is impossible to manually label all the images.Besides, human description is bound to be inaccurate and subjective.All these make techniques of text-based medical image retrieval fail to meet the needs of today's massive image retrieval [1].
Designed to solve the defects of such text-based techniques, techniques of content-based medical image retrieval, by processing the gray values, the topological and geometrical shapes, can extract these characteristics and establish respective retrieval criteria with them.This paper will focus on the algorithm of content-based medical image retrieval.

SYSTEM STRUCTURE OF CONTENT-BASED MEDICAL IMAGE RETRIEVAL
System of content-based medical image retrieval (illustrated by Fig. (1)) can be regarded as an information service system linking the user with the medical database.
Basically it works like this: first, it analyzes all the images in the database and extract the unique features from these *Address correspondence to this author at Department of Computer Science and Engineering, Jilin Jianzhu University, Changchun 130118, China; Tel: +8613944104396; E-mail: duliying@126.comimages; then establish the medical image database as well as respective feature library.When retrieving images, it adopts the retrieved image's features and matches them with all the features in the library.Then, according to the matching result, picks out the desired images from the database.The system of content-based medical image retrieval is formed by the image processing module and the retrieval module.The processing module's main functions are: image labeling, extraction of features, establishment of index and storage of the image data.It also contains the techniques of extraction and marking of region of interest, classification of images and segmentation of various sized images.

COMMON TECHNIQUES OF CONTENT-BASED MEDICAL IMAGE RETRIEVAL
The key techniques of the system of content-based medical image retrieval are contained mainly the medical image segmentation, extraction and marking of region of interest, extraction of features, similarity match and retrieval as well as relevant feedback techniques [2].

Medical Image Segmentation
In general, medical images are focused on specific human tissues or organs.Medical image segmentation means to segment such specific sections and sort out regions of interest, which provides local features for clinical diagnosis.
In this way, the data processing amount is reduced, the processing speed is raised and the accuracy of the disease analysis and diagnosis are also improved.

Features of Image Gray Value
Image gray value is one of the basic properties of images, which possesses certain robustness and stability.Its histo-gram reflects directly features of the general distribution; however, it shows no information about gray value's spatial location.The mean gray value histogram acts as its image gray value function and the gray value distribution function of all normal image data within the medical image database.The histogram can also be used in the comparison of normal images and abnormal images [3].

Texture Features
As a feature related to the geometric distribution of medical gray value, texture feature is more friendly to the doctor's intuitive feelings.In terms of the relationship with medical image pixels, texture feature analysis could be categorized as the statistical method, the spectral method, the model method and the structural method.The statistical method, being the frequently-used one, depicts texture features such as the uniformity, the directionality and the thickness.The operation theory is to extract different statistics from the image's texture feature.

Image Shape Feature
Being an important feature of the medical images, the shape feature holds a significant position in the medical image retrieval system.Common methods to depict image shape features include Freeman chain code, quadratic curves, B line, super-quadratic-curves and shape approach method based on wavelet transform.

Similarity Match and Retrieval
Techniques of high dimensional medical image retrieval, employing multidimensional hash table and network spatial technology in its early development time, received only modest results.Nowadays there are mature techniques of high dimensional medical image retrieval, such as Bucketing grouping algorithm, cluster, neural network, and quad-tree and k-d tree.Take self-organizing neural network tree index method (SOM) for example, it can memorize in the absence of human supervision, has dynamic clustering function and support arbitrary similarity measurement.All these make it an index technique with wide application potential.

Interactive Feedback Technique
System of content-based medical image retrieval's accuracy may be impaired by the inconsistency between the image's feature and the user's understanding toward the data.There with interactive feedback technique, the user can, without establishing weights for feature information, pick out images similar or dissimilar to the desired image.Then the system will automatically adjust the retrieval result according to the former feedback.Thus a better retrieval result is achieved and the inconsistency reduced.

Introduction of Retrieval Algorithm of Medical Image Retrieval
With the textual information and the content of the medical image combined organically, the improved retrieval algorithm of content-based medical image retrieval proposed in this paper can extract automatically the textual information for the initial retrieving, and then conduct the content-based retrieval with comprehended feature information of the images.With key words for images not being necessary, the algorithm can extract the desired information from vast features with part of the images sifted by the initial text-based retrieval.The retrieving efficiency is increased from decreased amount of arithmetic.The paper will prove the algorithm's efficiency with several experiments based on CT images of human colons [4].

Construction of Content-Based Medical Image Retrieval
Fig.
(2) shows the system frame of CBMIR, the CBMIR of this paper contains the following parts: 1. Database, of medical images, text information and image features: 2. Image import module, which filters images according to their formats and then separates text information from image data, filing them into respective database; 3. Image feature extraction module.This part extracts image's text information and file them in the library; 4 [5].Similarity match module.After matching the eigenvector of the image with what is in the library, it retrieves the medical images in line with their similarity; 5. Dynamic retrieval.The user can filter out unrelated images by initial retrieval, and find qualified ones with secondary retrieval of lower features.The user can jump to the secondary retrieval, of course.

Extraction Technique of Image's Text Information
DICOM, being recognized all over the world, is the criteria of digital medical image's transmission, display and stor-age.Its file consists of file header and image data, the file header containing relevant information of the identification data.Generally, CBMIR mainly extracts the patient's information from the header file, such as name, age, the hospital and the scanning time [6].
To make improvement on the retrieval, the algorithm proposed in this paper will move to location of the organ, image type, scanning posture and other more crucial information.To get better results, the user can first sift with text information, and then conduct a secondary retrieval with lower features of the medical images.Table 1 shows the text information of the CT image of human colon.

Extraction Technique of Medical Image's Lower Feature
The majority of medical images are gray level images with no apparent patterns and finer textures.Techniques of texture feature analysis, such as gray level co-occurrence matrix, can give clear image information in terms of direction and amplitude of variation, and depict medical image's direction of repetition and degree of roughness [7].Gray level co-occurrence matrix assumes that in the θ direction, with the number of N pixels distance, the probability of a pair of pixels with their respective gray value as i and j is

Extraction of Image Shape Feature
Currently, there are two types of descriptors of medical image's shape: area-based shape descriptor and profile-based descriptor.Area-based descriptor covers all the shape area and has advantage in terms of the resistance to external disturbances such as noise.Profile-based descriptors are Hu moment invariant, Zernike moment and others.Taking Hu moment invariant for example, it has been proposed that the seven moments invariant have the invariance of measurement, rotation and translation.For these properties, it is widely applied [8].

Integration of Various Features of Medical Image
No retrieval based on one single image feature can take all factors into consideration.To compensate this defect, integration of various features with different weights in line with the user's demand is needed.
In retrieval with texture features or shape features, the user, after distributing different weights to respective features according to his need, and conducting linear weighted value summation, can get the similarity degree of the two images [9].If the similarity of the text feature is S 1 , and similarity of shape is S 2 , and W 2 and W 2 are their respective weights, then the similarity summation is:

Performance Evaluation for the Retrieval
There are several criteria for the evaluation of MIR, and the commonly used factors are the accuracy and the recall ratio.Because those criteria require preset threshold value of the image's similarity degree, and that the number of medical image is massive and is subjective to compare similar images, it is difficult to determine the threshold value.This research adopts the criteria recommended by the mature MPEG-7 to evaluate the experiments' results.The equation: N ϕ means the correct rate of the former N image results; R is the image set with a certain feature; M is the preset number of sample images before the retrieval; q i are the sample images.

Ordering Method for Medical Images
Another common evaluation criterion for medical image's retrieval is ordering method for medical images.Its working principle: the more similar the retrieval result with the sample image is, the higher it will be ranked.Sorting equation adopted in this paper is: R is the set of relevant images, T is set of first N relevant images, and P is the serial number of the image.The closer to 1 its rank value is, the better is the ordering result [10].

Preparation for the Experiments of Image Retrieval
The data is from the American database of human colonoscopy, each record contains the CT images of both sidelying posture and the supine posture.The resolution: 512×512, The number of scanning layer: from 316 to 497.200 CT images of human colon are randomly chosen.

Procedures and Results of the Experiment
To test the retrieval algorithms' performance, the following experiments are conducted with three retrievals.30 most relevant images are extracted to evaluate the result.Fig. (3) is an example of the processing system of medical images.
1) Single feature medical image's retrieval (experiment one): calculating the gray value co-occurrence matrix on this four directions 0 0 , 45 0 , 90 0 , 135 0 , the texture feature values of the contrast ratio, reversed distance and entropy, then the retrieval was conducted by matching these texture feature values.
2) Integrated feature medical image's retrieval (experiment two): by distributing different weights to respective features according to his own need, the user retrieved the combined texture features and shape features.The weight ratio of the texture features and shape features could be 7:3.
3) Text-and content-based medical image retrieval in this paper (experiment 3): The sample images being the supine posture image, the key words were retrieved first, and then retrieval based on integrated image features was conducted, with the weight ratio of the texture features and shape features being 7:3.

Evaluation of the Experiments of Medical Image's Retrieval
By ranking the above experiments' mean retrieval correct rate P N and mean ordering value, we got the following conclusion: 1) Shape-based image feature retrieval is better than texture-based image feature retrieval.
For gray value co-occurrence matrix reflects the overall feature, and the images' texture varies much, shape-based image feature retrieval, possessing no great advantage, is still superior to other images.The comparative data is shown in Table 2.
2) Retrieval algorithm of integrated texture and shape features is better than algorithms based on single image feature.As the texture-based algorithm and the shape-based algorithm are complementary to each other, the integrated use of two methods improved the correct ratio and the ordering.
3) Text-and lower-feature-based retrieval has the best performance.As the user can retrieve the key words in the text database to extract all the needed images first and then match with lower features, their efficiency improved as the retrieval range is diminished.

CONCLUSION
Based on the study of text-based and content-based retrieval algorithms, the paper proposed an improved medical image retrieval system.Combining the advantages of textbased and content-based retrievals together, the research can help the doctor analyze and compare medical images more comprehensively and accurately.Its retrieval result is also suitable for visual perception.
Emphasizing on retrievals with integrated features, the paper has only discussed two features (texture and shape features), thus more characterized lower features will be tak-en in the future retrieval.Also, the retrieval of doctor's region of interest will be discussed, so as to achieve even better retrieving results.