MACHINE VISION
Machine vision is conceptually a relatively simple technology. There are many different executions, but for the most part machine vision involves combining television and computers. In its most straight forward form, it is the analysis of television pictures by computer. This concept transcends what is commonly referred to as machine vision and is generally associated with the broader field of electronic imaging.
In other words, machine vision is a subset of the field of electronic imaging, and specifically that subset associated with the application of electronic imaging technology or techniques in industrial manufacturing settings. In manufacturing, machine vision is employed for the purpose of control: quality control, process control, machine tool control, or robot control. As much as anything, the application of the technology of electronic imaging is what distinguishes machine vision from other subsets of the fields; such as applications in: medicine, offices, document scanning, etc.
Machine vision all begin with an image - a picture. In many ways the issues associated with a quality image in machine vision are similar to the issues associated with obtaining a quality image in a photograph. In the first place, quality lighting is required in order to obtain a bright enough reflected image of the object. Lighting should be uniformly distributed over the object. Non-uniform lighting will affect the distribution of brightness values that will be picked up by the television camera.
As is the case in photography, lighting tricks can be used in order to exaggerate certain conditions in the scene being viewed. For example, it is possible that shadows can in effect include high contrast information that can be used to make a decision about the scene being viewed.
The types of lamps that are used to provide illumination may also influence the quality of the image. For example, fluorescent lamps have a higher blue spectral output than incandescent lamps. While the blue spectral output is more consistent with the spectral sensitivity of the eye, higher infrared output is typically more compatible with the spectral sensitivity of solid state sensors that are used in machine vision.
It has been found that the sensitivity of human inspectors can be enhanced as a consequence of using softer lighting or fluorescent lamps with gases that provide more red spectral output; so too it may also be the case in machine vision. That is, that the lamps spectral output may influence the contrast associated with the specific feature one is attempting to analyze.
As in photography, machine vision uses a lens to capture a picture of the object and focus it onto a sensor plane. The quality of the lens will influence the quality of the image. Distortions and aberrations could effect the size of features in image space. Vignetting in a lens can affect the distribution of light across the image plane. Magnification of the lens has to be appropriate for the application. As much as possible the image of the object should fill the image plane of the sensor. Allowances have to be made for any registration errors associated with the position of the object and the repeatability of that positioning. The focal length and aperture have to be optimized in order to handle the depth of field associated with the object.
The imaging sensor that is used in the machine vision system will basically dictate the limit of discrimination of detail that will be experienced with the system. Imaging sensors have a finite number of discrete detectors and this number limits the number of spatial data elements that can be processed or into which the image will be dissected. In a typical television based machine vision system today the number of spatial data points is on the order of 400 to 500 horizontal x 400 to 500 vertical.
What this means basically is that the smallest piece of information that can be discriminated is going to be a function of the field of view. Just like in photography one can use panoramic optics to take a view of a mountain range, although a family might be in the picture in the foothills of the mountains, it is unlikely that you would be able to discriminate the family in the picture. On the other hand, using a different lens and moving closer to the family, one would be able to capture the facial expressions of each member, but the resulting picture would not include the peaks of the mountains.
So, for example, given that an application requires a one inch field of view and a sensor with the equivalent of 500 spatial data points is used, one would have a spatial data point that would be approximately .002 inches on the side. Significantly, the ability of machine vision today to discriminate details in a scene is generally better than the size of a spatial data point.
In a manner basically analogous to how an eye can see stars in a night sky because of the contrast associated with the star light, so too in machine vision techniques exist which allow systems to be able to discriminate details smaller than a spatial data element. Again, contrast is critical. The claims for subpixel sensitivity vary from vendor to vendor and depend very much on their execution and the application.
In all machine vision systems up until this point in our discussion, the information or the image has been in an analog format. For a computer to operate on the picture the analog image must be digitized. This operation basically consists of sampling at discrete locations along the analog signal that corresponds to a plot of time vs. brightness, and quantizing the brightness at that sample point.
The actual brightness value is dependent on: the lighting, the reflective property of the object, conditions in the atmosphere between the lighting and the object and between the object and the camera, and the specific detector sensitivity in the imaging sensor. Most vision systems today characterize the brightness into a value of between 0 and 255. The brightness so characterized is generally referred to as a shade of gray.
For the most part today machine vision systems are monochromatic. Consequently, the color may also be a factor in the brightness value. That is, it is possible to have a shade of red and a shade of green (and so on) all of which would have the same brightness value. In many cases where color issues are a concern, filters are used in order to eliminate all colors that are not of interest to the particular application. In this way the gray shades are an indicator of the saturation level associated with a specific color.
At last we have a picture that has been prepared for a computer. In most machine vision systems today, the digitized image is stored in memory that is separated from the computer memory. This dedicated memory is refered to as a frame store - where frame is synonymous with the term used in television to describe a single picture. In some cases the dedicated hardware that includes the frame store also includes the analog-to-digital converter as well as other electronics to permit one to view images after processing steps have been conducted on the image to view the effects of these processing procedures.
Now the computer can operate on the image. The operation of the computer on the image is generally refereed to as image processing. In addition to operating on the image, the computer is also used to analyze the image and make a decision on the basis of the analyzed image and perform an operation accordingly. What is typically referred to as the machine vision system is the combination of image processing, analysis and decision making techniques that are embodied in the computer.
A good analogy can be made to a tool box. Virtually all machine vision systems today include certain fundamental tools, much like a hammer, screwdriver or pliers. Beyond these, different suppliers have developed additional tools, more often than not driven by a specific class of applications. Consequently the description frequently given for machine vision as being an "idiot savant" is quite apropos. That is, most of the platforms are brilliant on one set of applications but "idiots" or truly not the optimal for other applications.
It is important, therefore, to select the vision platform or tool box with the most appropriate tools for an application. Significantly, no machine vision systems exist today that come anywhere near simulating the comprehensive image understanding capabilities that people have. It is noted that for many applications many different tools will actually do the job and in many cases without sacrificing performance. On the other hand, in some cases while the tools appear to do the job, performance might be marginal, in a manner analogous to when we attempt to use a flat head screwdriver in order to turn a screw with a Phillips head.
Image processing is generally performed on most images for basically two reasons: to improve or enhance the image and, therefore, make the decision associated with the image more reliable, and to segment the image or to separate the features of importance from those that are unimportant. Enhancement might be performed, for example, to correct for the non-uniformity in sensitivity from photo site to photo site in the imaging sensor, correct for distortion, correct for non-uniformity of illumination, to enhance the contrast in the scene, correct for perspective, etc.
These enhancement steps could be as simple as adding or subtracting a specific value to each shade of gray or can involve a variety of logical operations on the picture. There are many such routines. One routine that is commonly found as a tool for image processing in most vision platforms today is a histogram routine. This involves developing a frequency distribution associated with the number of times a given gray shade is determined.
One use of histograms is to improve contrast. This involves mathematically redistributing the histogram so that pixels are assigned to gray shades covering 0 to 255, for example. In an image with this type of contrast enhancement it could be easier to establish boundaries or easier to establish a specific gray shade level or threshold to use to binarize the image. Binarizing an image, or segmenting an image based on a threshold above which all pixels are turned on and below which all pixels are turned off, is a conventional segmentation tool included in most vision platforms and can be effective where high contrast exists.
Where contrast in a scene is not substantial, segmentation based on edges may be more appropriate. Edges can be characterized as locations where gradients or gray shade changes take place. Both the gradient as well as the direction of change can be used as properties to characterize an edge. Significantly edges can be caused by shadows as well as reflectance changes on the surface in addition to the boundaries of the object itself. Artifacts in the image may also contribute to edges. For example, unwanted porosity may also be characterized by increased edges.
There are many different ways edges are characterized. One of the simplest is just using the fact that there are sharp gray scale changes at an edge. Significantly, however, edges in fact appear across several neighboring pixels and what one has is in fact a profile of an edge across the pixels. Because of this there are ways to mathematically discriminate the physical position of an edge to a value less than the size of the pixel. Again, there are many ways that these subpixel calculations have been made and the results are very application dependent. Consequently although claims are made of one part in ten or better subpixelling capability, it is important to understand that the properties of a given application can reduce the effectiveness of subpixelling techniques.
Having performed image processing routines to enhance and segment an image, the computer is now used to analyze the image. The specific analysis conducted is again going to be very application dependent. In the case of a robot guidance application, for example, a geometric analysis would typically be conducted on the segmented image. Looking at the thresholded segmented image or edge segmented image one would be able to calculate the centroid property and furnish this as a coordinate in space for the robot to pick up an object, for example.
In the case of using vision systems to perform inspections of one type or another, there are literally hundreds of different types of analysis techniques that have emerged. The number of pixels associated with the binarized or thresholded picture, for example, could be counted. This could be a relatively simple measure of the completeness of an object. The number of transitions or times that one goes from black to white can be counted. The distance between transitions can be counted and can serve as a measurement between boundaries of an object. The number of pixels that are associated with an edge can be counted. Vectors associated with the direction of the gradient at an edge can be used as the analysis features. A model based on the edges can be derived where the edges can be characterized as vectors of a certain length and angle. Geometric features can be extracted from the enhanced image and used as the basis of decisions.
These same techniques can be used in conjunction with pattern recognition applications. In each case a pattern can be defined by one or more of the above mentioned features extracted from the image. For example, maybe a combination of the transition counts and edge pixels would be sufficient to make a judgement about patterns where that combination is sufficient to distinguish between the patterns. Another approach might be to use geometric properties to distinguish patterns. These might include length and width ratios, perimeter, etc.
The computer having reduced the image to a set of features used as the basis of analysis would typically then use a deterministic or probabilistic approach to analyze the features. A probabilistic approach is one that basically suggests that given a certain property associated with a feature, there is a high probability that the object is in fact good. So, for example, using the total number of pixels as an indication of the completeness of an object one would be able to suggest that if the total number of pixels exceeded say 10,000 there is a high probability that the object is complete. If less than 10,000 the object should be rejected because it would be characterized as incomplete. Some refer to this as goodness-of-fit criteria. It is also possible to set a boundary around this criteria. That is, it should fall between 10,000 and 10,500. An indication of a pixel count greater than 10,500 could be an indication, for example, of excess flashing.
A deterministic approach is one that will use physical feature properties as the criteria. For example, the distance between two boundaries has to be one inch +/- .005". The perimeter of the object must fall between 12 inches +/- .020". The pattern must match the following criteria in order to be considered a match: length/width ratio of a certain value, perimeter of a certain value, centroid of a given calculated value, etc.
In a deterministic mode each of the features can be associated with a vector in decision space. In a pattern recognition application, the combined feature vector or the shortest distance to the known feature set for each of the patterns is the one that would be selected. This type of evaluation is referred to as decision theoretic. Another type of analysis is one based on syntactic techniques. In these cases, primitives associated with pieces of the image are extracted and the relationship between them is compared to a known data base associated with the image.
In other words, the primitives and their relationship to each other have to abide to a set of rules. Using syntactic techniques one may be able to infer certain primitives and their position knowing something about other primitives in the image and their position with respect to each other. This could be a technique to handle parts that might be overlapping and still be able to make certain decisions associated with those parts even though one can not see them entirely.
As you can see there are many vision tools that are available and the specific tools that one requires are application dependent. Today one can find machine vision type technology in virtually every manufacturing industry. The largest adopter by far is the electronics industry. In microelectronics, machine vision techniques are used to automatically perform inspections throughout the integrated circuit manufacturing process: photomask fabrication, post die slicing inspection, pre-cap inspection and final package inspection for mark integrity.
Throughout the manufacturing process, machine vision is also used to provide feedback for position correction in conjunction with a variety of manufacturing processes such as die slicing and bonding and wire bonding. In the macroelectronic industry machine vision is being used to inspect printed circuit boards for conductor width spacing, populated printed circuit boards for completeness, post solder inspection for solder integrity.
As in microelectronics it is also being used to perform positional feedback in conjunction with component placement. It has become an integral part of the manufacturing process associated with the placement of chip carriers with relatively high density pin counts.
In industries that produce products on a continuous web, such as the paper, plastic, and textile industries, machine vision techniques are being used to perform an inspection of the integrity of the product being produced. Where coatings are applied to such products, machine vision is also being used to guaranty the coverage and quality of coverage. In the printing industry one finds machine vision being used in conjunction with registration.
The food industry finds machine vision being used in the process end to inspect products for sorting purposes, that is sorting out defective conditions or misshapen product or undersize/oversize product, etc. At the packaging end it is being used to verify the size and shape of contents, such as candy bars and cookies to make sure they will fit in their respective packages.
Throughout the consumer manufacturing industries one will find machine vision in various applications. These include label verification, that is, verifying the position, quality and correctness of the label. In the pharmaceutical industry one finds it being used to perform character verification, that is verifying the correctness as well as the integrity of the character sets corresponding to date and lot code.
The automotive industry finds itself using machine vision for many applications. These include looking at the flushness and fit of sheet metal assemblies, including the final car assembly; looking at paint qualities, such as gloss; inspecting for flaws on sheet metal stampings; verifying the completeness of a variety of assemblies from ball bearings to transmissions; etc.; used in conjunction with robots to provide visual feedback for: sealant applications, windshield insertion applications, robotic hydropiercing operations, robotic seam tracking operations, etc.
Virtually every industry has seen the adoption of machine vision in some way or another. The toothbrush industry, for example, has vision systems that are used to verify the integrity of the toothbrush. The plastics industry looks at empty mold cavities to make sure that they are empty before filling them again. The container industry is using machine vision techniques widely. In metal cans they look at the quality of the can ends for cosmetic flaws, presence of compound, score depth on converted ends, etc. The can itself is examined to inspect it for defective conditions internally.
The glass container industry uses machine vision widely to inspect for sidewall defects, mouth defects and empty bottle states as well as dimensions and shapes. In these cases vision techniques have proven to be able to handle 1800 to 2000 objects per minute.
How do I know what machine vision techniques are most suitable for my application? A studied approach is usually required unless the application is one that has a system that has been widely deployed throughout an industry. In that case the pioneering work has already been done. Adaption to one's own situation while not trivial may have little risk. To find out if your application has been solved, ask around. Today most machine vision companies when contacted and when asked, if they do not offer the specific solution, if they know of any other company that does, will generally respond with candor and advise accordingly. Consultants may also be able to identify sources of specific solutions.
Having identified those sources they should be contacted to identify their referenceable accounts and these in turn should be contacted to determine: why they were selected, what has been the experience, service, etc. would they purchase the same product? This should help to narrow down the number of companies to be solicited for the project. In this case the ultimate selection will no doubt be largely based on price, though policies such as training, warranty, service, spare parts, etc., should also be considered as they will impact the life cycle cost.
What do you do if you find your application is not a proliferation of someone else's success? In this case a detailed application description and functional specification should be prepared. This means really getting to know the application - what are all the exceptions and variables? The most critical ones are position and appearance. These must be understood and described comprehensively.
What are the specific requirements of the application? Will the system first have to find the object - even minor translation due to vibration can be a problem for some machine vision executions. In addition to translation, will the part be presented in different rotations? Are different colors, shades, specular properties, finishes, etc. anticipated? Does the application require recognition? Is it gauging? What are the part tolerances? What per cent of the tolerance band would it be acceptable to discriminate? If flaw detection, what size flaw is a flaw? Is the flaw characterized by reflectance change, by geometric change, etc.
Having prepared the spec, at least a preliminary acceptance test for system buy-off should be prepared and solicitations should be forwarded to potential suppliers. How do you identify those suppliers? A telephone survey of the 150 or so companies is one approach. Again, use of a consultant can greatly accelerate the search. In any event, the leading question should be whether or not they have successfully delivered systems that address similar requirements.
Since we have already established that the application does not represent the proliferation of an existing system solution, the best one can expect is to find a number of companies that have been successful in delivering systems that address needs similar to yours and seemed to have been able to handle similar complexities. So, for example, if the application is flaw detection - are the type and size flaws similar? Is the part, size and geometric complexity and material similar? Is part positioning similar, etc.?
This survey should narrow the number of companies to be solicited to four to six. The solicitation package should demand a certain proposal response. It is important to get a response that reflects that the application has truly been thought about. It is not sufficient to get a quotation and cover letter that basically says "trust me" and "when I get the order I will think about how I'm going to handle it." The proposal should give system details. What lighting will be used and why was that arrangement selected? How about the camera properties, have they been thought through? How about timing, resolution, sampling considerations, etc.?
Most importantly, does the proposal reflect an understanding of how the properties of the vision platform will be applied and can it defend that those properties are appropriate for the application? How will location analysis be handled? What image processing routines will be enabled specifically to address the application? A litany of the image processing routines inherent in the platform is not the issue. Rather what preprocessing is being recommended, if any? What analysis routines, etc.? Along with this an estimate should be prepared of the timing associated with the execution from snapping a picture through to signal availability reflecting the results of a decision. This should be consistent with your throughput requirements.
When a vendor has thought through the application this way and conducted a rather comprehensive analysis, he is in a good position to provide both a schedule of project development tasks and a good estimate of the project cost. An excellent paper that describes this systematic approach as applied to a "Frobus Assembly" was written by Dr. Joseph Wilder and can be found in SPIE Volume 849 "Automated Inspection and High Speed Vision Architectures."
By insisting on this type analysis in the proposal, both vendor and buyer should avoid surprises. Among other things it will give the buyer a sense that the application is understood. Those proposals responsive in this manner should be further evaluated using a systematic procedure such as Kepner Tregoe decision making techniques. These involve establishing criteria to use as the basis of the evaluation, applying a weighting factor to the criterion and then evaluating each of the responses against each weighted criterion to come up with a value.
This value represents a measure of how a company satisfies the criterion along with the relative importance of that criterion to the project. In some cases, the score given should be 0 if the approach fails to satisfy one of the absolute requirements of the application. A good paper describing the application of these techniques to evaluating machine vision proposals was written by Ed Abbott and delivered at the SME sponsored Vision 85 Conference, March 25-28, 1985.
Having made a decision on a vendor, justifying the project may be the next issue. Significantly, justification based solely on labor displacement is unlikely to satisfy the ROI requirements. Quantifying additional savings is more difficult but in reality may yield an even greater impact than labor savings. Product returns and warranty cost should be evaluated to assess how much they will be reduced by the machine vision system. The cost of rework should be a matter of record. In addition, however, a value can be calculated for the space associated with rework inventory as well as the rework inventory itself should be included. The cost of rejects and related material costs, the cost of waste disposal associated with rejects, the cost of freight costs on returns are all very tangible quantifiable costs.
There are other savings, through less tangible, which should be estimated and quantified. These include items such as:
1. The cost of overruns to compensate for yield.
2. The avoidance of inspection bottlenecks and impact on
inventory income and inventory turn-over.
3. The elimination of adding value to scrap conditions.
4. The potential for increased machine uptime and productivity accordingly.
5. The elimination of schedule upsets due to the production
of items that require rework.
Another observation is that when considering the savings due to labor displacement, it is important to include all the savings. These include:
1. Recruiting
2. Training
3. Scrap rework created while learning a new job
4. Average workers compensation paid for injuries
5. Average educational grant per employee
6. Personnel/payroll department costs per employee
Overall the deployment of machine vision will result in improved and predictable quality. This in turn will yield improved customer satisfaction and an opportunity to increase market share - the biggest payback of all.