RULES OF THUMB FOR EVALUATING MACHINE VISION APPLICATIONS

So you think you have a machine vision application. Do you want to somehow determine if the project is at least remotely feasible and you don't want to usea company's salesman to do the evaluation? How does one go about doing that? Well there are several "rules of thumb" that can be used to at least get some measure of feasibility. These all start with having at least a fundamental understanding of how a computer operates on a television image to sample and quantize the data. Understanding what happens is relatively straight forward if one understands that the TV image is very analogous to a photograph.

The computer operating on the television image in effect samples the data in object space into a finite number of spatial (2D) data points which are called pixels. Each pixel is assigned an address in the computer and a quantized value which can vary from 0 to 63 in some machine vision systems or 0 to 255 in others. The actual number of sampled data points is going to be dictated by the camera properties, the analog to digital converter sampling rate, and the memory format of the picture buffer or frame buffer as it is called.

Today more often than not the limiting factor is the television camera that is being used. Since most machine vision vendors today are using cameras that have solid state photo sensor arrays on the order of 500 or so by 500 or so one can make certain judgements about an application just knowing this figure and assuming each pixel is approximately square. For example, given that the object you are vie ing is going to take up a one inch field of view, the size of the smallest piece of spatial data in object space will be on the order of 2 mils, or one inch divided by 500. In other words, the data associated with a pixel in the computer will reflect a geographic region on the object on the order of 2 mils by 2 mils.

One can so establish what the smallest spatial data point in object space will be very quickly for any application: X (mils) = largest dimension/500. Significantly this may not be the size of the smallest detail a machine vision system can observe in conjunction with the application. The nature of the application, contrast associated with the detail that you want to detect, and positional repeatability are the principal factors that will also contribute to the size of the smallest detail that can be seen by the machine vision system.

The nature of the application refers to exactly what you want to do with the vision system: verify that an assembly is correct, make a dimensional measurement on the object, locate the object in space, detect flaws or cosmetic defects on the object, read characters or recognize the object. Contrast has to do with the difference in shade of gray between what you want to discriminate and the background, for example.

The organizational repeatability is in effect just that - how repeatable will the object be positioned in front of the camera. If it can not be positioned precisely the same way each time then it means that the field of view will have to be opened up to include the entire area in which one can expect to find the object. This will in turn mean that there will be fewer pixels covering the object itself. Vibration is another issue which can impact the size of a pixel as does typically motion in the direction of the camera itself since then optical magnification may become a factor - increasing or

decreasing the size of the spatial data point in object space.

Let's take the generic applications one at a time. You want to verify that all of the features on an assembly are in place. If you can perceive that there is a high contrast between each of the features and the background or when a feature is in place or not in place, then the smallest feature one can expect to be able to detect would have to cover a two pixel by two pixel or so area. If, on the other hand, the contrast is relatively low then a good rule of thumb is that the feature should cover at least 1% of the field of view, or in the case of 500 by 500 pixels a total of some 2500 pixels. So knowing the size of a pixel in object space one can multiply that value 2 or 2500, depending on contrast, to determine the area of the smallest detectable feature.

In the case of making dimensional measurements with a machine vision system one can consider the 500 pixels in each direction as if they were 500 marks as on a ruler. Significantly, just as in making measurements with a ruler a person can interpolate where the edge of a feature falls within lines on a ruler, so, too, can a machine vision system. This ability to interpolate, however, is very application dependent. Today the claims of vision companies vary all the way from one third of a pixel to one tenth or one fifteenth of a pixel. For purposes of a rule of thumb, you can use one tenth of a pixel.

What will this mean in conjunction with a dimensional measuring application? Metrologists have used a number of rules of thumb themselves in conjunction with measuring instruments. For example, the accuracy and repeatability of the measurement instrument itself should be ten times better than the tolerance associated with the dimension being checked. Today this figure is frequently modified to one fourth of the tolerance. The other rule of thumb that is often used by metrologists is that the sum of repeatability and accuracy should be a factor of three or one third the tolerance.

So how does one establish what the repeatability of a vision system should be? Given the sub-pixel capability of one tenth of a pixel mentioned above and as in the example an object that is one inch on a side, the discrimination (the smallest change in dimension detectable with the measuring instrument) associated wiith the machine vision system as a measuring tool would be one tenth of the smallest spatial data point or two mils or .0002". Repeatability will be typically +/- the discrimination value or .0002".

Accuracy, which is determined by calibration against a standard, can be expected to run about the same. Hence, the sum of accuracy and repeatability in this example would be 0.0004". Using the three to one rule, the part tolerance should be no tighter than 0.0012" for machine vision to be a reliable metrology tool. In other words, if your part tolerance for this size part is on the order of +/-.001" or greater, the vision system would be suitable for making the dimensional check.

As you can see, as the parts become larger and with the same type tolerances, machine vision might not be an appropriate means for making the dimensional check, that is, based on the use of area cameras that only have 500 x 500 discrete photosites. Conversely, if the tolerances were tighter the same would be true.

Using machine vision to perform a part location function one can expect to achieve basically the same results as making dimensional checks. That is, most vendors whose systems are suitable for performing part location claim an ability to perform that function to a repeatability and accuracy of +/- one tenth of a pixel. Using our example again, namely a one inch part, one would be able to use a vision system to find the position of that part to within +/-.0002".

For applications involving flaw detection, contrast is especially critical in determining what can be detected. Where contrast is extremely high, virtually white on black, it is possible to detect flaws that are on the order of one-third of a pixel. Significantly, one can detect these flaws but not actually measure them or classify them. When detecting flaws that are characterized as geometric in nature, for example, scratches or porosity, it is noted that the presence of such flaws can frequently be exaggerated by creative lighting and staging techniques. So if those were the only flaws one wanted to detect and detection was all that was necessary, a rule of thumb would be that the flaw has to be greater than one-thir of a pixel in size.

Where contrast is moderate, the rule of thumb associated with assembly verification, namely that the flaw cover an area of two by two pixels would be appropriate. Classifying a flaw with moderate contrast would require that it cover a larger area, on the order of 25 pixels or so. Again, where contrast associated with a flaw is relatively low as is the case with many stains, the 1% of the field of view rule would hold or it should cover 2500 or so pixels. Significantly, if it is a question that one is trying to detect flaws in a background that is itself a varying pattern (stains on a printed fabric, for example), the chances are that one would only be able to detect very high contrast flaws.

For applications involving optical character recognition (OCR) or optical character verification, the rule of thumb is that the stroke width of the smallest character should be at least three pixels wide. A typical character should cover an area on the order of 20-25 pixels by 20-25 pixels. The critical issue here then is the length of the string of characters that one wants to read. At 20 pixels across a character and two pixels spacing between characters, the maximum length of the character string would be on the order of 22 characters in order to fit into a camera with a 500 photosite arrangement. In optical character recognition/verification applications, a bold font style is desirable. In general it is also true that only one font style can be handled at a given time.

Another rule of thumb is that the best OCR systems have a correct read rate on the order of 99.9%. In other words, one out of every thousand characters will be either misread or a "no-read". The impact of this should be evaluated. For example, if 300 objects per minute are to be read, and 0.1% are sorted as "no reads", in one hour you would have approximately 20 products to be read manually. Is this acceptable? This is the best case scenario. The worst case would be if they were misread.

When it comes to pattern recognition applications, a reasonable rule of thumb is that the differences between the patterns should be characterized by something on the object that is greater than 1% of the field of view or again on the order of 2500 pixels. Significantly, the gray shade pattern can be a major factor in making it possible to see pattern differences or to recognize patterns that have differences of far less than 2500 pixels. This would be the case, for example, where both geometry and color are factors.

Significantly, where more than one generic application is involved in the actual application, the worst case scenario should be determined and used as the criteria to establish feasibility. Throughout this rule of thumb analysis, the dictating factor has been the number of photosites in the camera. Significantly, today solid state cameras do exist that have up to 1,000 x 1,000 photo sites. These cameras, however, are not cheap. It is even possible that they would be more expensive than the vision system itself. Furthermore, few commercialized machine vision systems have the capacity to process so many pixels and make vision/decisions at any where near real time rates.

An alternative, however, to capturing images of an object with an area camera would be to use a linear array camera. There are several vision companies who offer linear array based vision systems where the linear arrays have up to 2,000 photosites. Using a linear array one would have to move an object under the camera or move the camera over the object in order to capture a two dimensional image. Significantly, if the object is going to be moved under the camera, the speed with which it passes must be well regulated and the operating speed of the camera in combination with the speed of the object as it passes underneath the camera will in effect dictate the size of the pixel in the direction of travel.

Typically vision systems that use these principles will operate at up to 2 megahertz rates. For a 2,000 element array that means that you will be scanning 1,000 lines per second (2,000,000/2000) in the direction of travel. For example, given an object speed of 10 inches per second (10,000 mils per second), at a sample rate of 1,000 lines per second, the effective pixel size in the direction of travel will be 10 mils (10,000/1000). So when evaluating machine vision applications, you may want to consider the possibility that the application can be addressed with a linear array based technique. In these instances all of the size details one can discriminate in object space would be proportionally better. For example, with a 2,000 element linear array, everything would be four times better than using an area camera with 500 x 500 photosites.

Significantly, these are meant to be rules of thumb and should be only used as such in the evaluation of an application. Having performed this type of evaluation, however, it would be more reasonable for you to decide whether or not to pursue an application. It will avoid your wasting time with salesman trying to convince you that your application is "a piece of cake".