Six Sigma: Measure : 3 Develop Data Collection Plan
Data Coding Random Process Stratified Subgroup Sequential Systematic Random Population Sampling Assuring Data Accuracy and Integrity Methods for Collecting Data A Common Data Collection Tool: Checksheets
Methods for Collecting Data
Collecting information is not cheap. To help insure that the data is relevant to the problem, some prior thought must be given to what is expected. Some guidelines are: .
Formulate a clear statement of the problem . Define precisely what is to be measured
. List all the important characteristics to be measured
. Carefully select the right measurement technique
. Construct an uncomplicated data form
. Decide who will collect the data
. Arrange for an appropriate sampling method
. Decide who will analyze and interpret the results
. Decide who will report the results
A Common Data Collection Tool: Checksheets
•Checksheet Elements:
–A description of what data is being collected
–Places to put the data
–Room for comments
–Room to keep track of stratification factors
•Things to Remember about Checksheets:
–Keep the form simple to use and understand
–Include only information intended for use
–Try out the form before implementation and make changes if necessary
Without an operational definition, most data is meaningless. The ability to control quality requires measurement. Both attribute and variable specifications must be nailed down. Data collection includes both manual and automatic methods.
Data collected manually may be done using printed paper forms or by data entry at the time the measurements are taken. Manual systems are labor intensive and subject to human errors in measuring and recording the correct values.
Automatic data collection includes electronic chart recorders and digital storage. The data collection frequency may be synchronous, based on a set time interval, or asynchronous, based on events. Automatic systems have higher initial costs than manual systems, and have the disadvantage of collecting both “good” and”erroneous” data. Advantages of automatic data collection systems include high accuracy rates and the ability to operate unattended.
The average human brain is not good at comparing more than a few numbers at a time. Therefore, a large amount of data is often difficult to analyze unless it is presented in some easily digested format. Graphs, charts, histograms, talliesand Pareto diagrams are used to analyze and present data.
Data Coding
The efficiency of data entry and analysis is frequently improvedby data coding. Problems due to not coding include: . Inspectors trying to squeeze too many digits into small blockson a check sheet form . Reduced throughput and increased errors by clerks at keyboardsreading and entering large sequences of digits for a single observation . Insensitivity of analytic results due to round off of large sequences of digits
Coding by substitution:
Consider a dimensional inspection procedure in which the specification is nominal plus and minus 1.25 inches. The measurement resolution is 1/8 inch and inspectors, using a ruler, record plus and minus deviations from nominal. A typical recorded observation might be 32-3/8 inches cramped in a check sheet space designed for a three-character width. The data can be coded as integers expressing the number of 1/8 inch increments deviating from nominal. The suggestion that check sheet blocks could be made larger could be countered by the objection that there would be fewer samples and plot points per page.
Coding by truncation of repetitive place values:
Measurements such as 0.55303, 0.55310, 0.55308 in which the digits 0.553 repeat in all observations can be recorded as the last two digits expressed as integers. Depending on the analysis objectives, it mayor may not be necessary to decode the measurements.
Techniques for Assuring Data Accuracy and Integrity
Bad data is not only costly to capture, but corrupts the decision-making process. Some considerations include:
. Avoid emotional bias relative to targets or tolerances when counting, measuring, or recording digital or analog displays.
. Avoid unnecessary rounding. Rounding often reduces measurement sensitivity. Averages should be calculated to at least one more decimal position than individual readings.
. If data occurs in time sequence, record the order of its capture.
. If an item characteristic changes over time, record the measurement or classification as soon as possible after its manufacture as well as after a stabilization period.
. To apply statistics which assume a normal population, determine whether the expected dispersion of data can be represented by at least 8 to 10 resolution increments. If not, the default statistic may be the count of observations which do or do not meet specification criteria.
.Screen or filter data to detect and remove data entry errors such as digital transposition and magnitude shifts due to a misplaced decimal point.
. Avoid removal by hunch. Use objective statistical tests to identify outliers.
. Each important classification identification should be recorded along with the data. This information can include: time, machine, auditor, operator, gage,lab, material, target, process change and conditions, etc.
It is important to select a sampling plan appropriate for the purpose of the use of the data. There are no standards as to which plan is to be used for data collection and analysis, therefore the analyst makes a decision based upon experience andthe specific needs. A few sampling methods are listed on the next page. There are many other sampling techniques that have been developed for specific needs.
Random Sampling
Sampling is often undertaken because of time and economic advantage. The use of a sampling plan requires randomness in sample selection. Obviously, true random sampling requires giving every part an equal chance of being selected for the sample.
The sample must be representative of the lot and not just the product that is easy to obtain. Thus, the selection of samples requires some up front thought and planning. Often, emphasis is placed on the mechanics of sampling plan usage and not on sample identification and selection. Sampling without randomness ruins the effectiveness of any plan. The product to be sampled may take many forms: in a layer, on a conveyor, in sequential order, etc.
The sampling sequence must be based on an independent random plan. The sample is determined by selecting an appropriate number from a hat or random number table.
Purpose of Sampling
Reasons to use sampling:
•It is often impractical or too costly to collect all the data
•Sometimes data collection is a destructive process
•Sound conclusions can often be made from a relatively small amount of data
Types and Ways of Sampling
Process Sampling: Helps understand nature and condition of the process
Population Sampling: Determines characteristics of the population
Methods of Sampling
Random Sampling : Each item has equal probability of being selected
Stratified Random Sampling : Population “stratified”into groups; random selection within each group
One of the basic assumptions made in sampling is that the sampleis randomly selected from a homogeneous lot. When sampling, the “lot” may not be homogeneous. For example, parts may have been produced on different lines, different machines, or under different conditions. One product line may have well maintained equipment, while another product line may be older orpoorly maintained equipment.
The concept behind stratified sampling is to attempt to select random samples from each group or process that is different from other similar groups or processes. The resulting mix of samples thus drawn can be biased if the proportion of the samples does not reflect the relative frequency of the groups. To the person using the sample data, the implication is that they must first be aware ofthe possibility of stratified groups and second, phrase the data report such that the observations are relevant only to the sample drawn and may not necessarily reflect the overall system.
Systematic Random Sampling : Every nth item in row
Subgroup Sampling : 3 Samples at this point each hour
Sequential Sampling
Sequential sampling plans are similar to multiple sampling plansexcept that sequential sampling can theoretically continue indefinitely. Usually, these plans are ended after the number inspected has exceeded three times the sample size of a corresponding single sampling plan. Sequential testing is used for costly or destructive testing with sample sizes of one and are based on a probability ratio test developed by Wald (1947).
Methods for Collecting Data
Collecting information is not cheap. To help insure that the data is relevant to the problem, some prior thought must be given to what is expected. Some guidelines are: .
Formulate a clear statement of the problem . Define precisely what is to be measured
. List all the important characteristics to be measured
. Carefully select the right measurement technique
. Construct an uncomplicated data form
. Decide who will collect the data
. Arrange for an appropriate sampling method
. Decide who will analyze and interpret the results
. Decide who will report the results
A Common Data Collection Tool: Checksheets
•Checksheet Elements:
–A description of what data is being collected
–Places to put the data
–Room for comments
–Room to keep track of stratification factors
•Things to Remember about Checksheets:
–Keep the form simple to use and understand
–Include only information intended for use
–Try out the form before implementation and make changes if necessary
Without an operational definition, most data is meaningless. The ability to control quality requires measurement. Both attribute and variable specifications must be nailed down. Data collection includes both manual and automatic methods.
Data collected manually may be done using printed paper forms or by data entry at the time the measurements are taken. Manual systems are labor intensive and subject to human errors in measuring and recording the correct values.
Automatic data collection includes electronic chart recorders and digital storage. The data collection frequency may be synchronous, based on a set time interval, or asynchronous, based on events. Automatic systems have higher initial costs than manual systems, and have the disadvantage of collecting both “good” and”erroneous” data. Advantages of automatic data collection systems include high accuracy rates and the ability to operate unattended.
The average human brain is not good at comparing more than a few numbers at a time. Therefore, a large amount of data is often difficult to analyze unless it is presented in some easily digested format. Graphs, charts, histograms, talliesand Pareto diagrams are used to analyze and present data.
Data Coding
The efficiency of data entry and analysis is frequently improvedby data coding. Problems due to not coding include: . Inspectors trying to squeeze too many digits into small blockson a check sheet form . Reduced throughput and increased errors by clerks at keyboardsreading and entering large sequences of digits for a single observation . Insensitivity of analytic results due to round off of large sequences of digits
Coding by substitution:
Consider a dimensional inspection procedure in which the specification is nominal plus and minus 1.25 inches. The measurement resolution is 1/8 inch and inspectors, using a ruler, record plus and minus deviations from nominal. A typical recorded observation might be 32-3/8 inches cramped in a check sheet space designed for a three-character width. The data can be coded as integers expressing the number of 1/8 inch increments deviating from nominal. The suggestion that check sheet blocks could be made larger could be countered by the objection that there would be fewer samples and plot points per page.
Coding by truncation of repetitive place values:
Measurements such as 0.55303, 0.55310, 0.55308 in which the digits 0.553 repeat in all observations can be recorded as the last two digits expressed as integers. Depending on the analysis objectives, it mayor may not be necessary to decode the measurements.
Techniques for Assuring Data Accuracy and Integrity
Bad data is not only costly to capture, but corrupts the decision-making process. Some considerations include:
. Avoid emotional bias relative to targets or tolerances when counting, measuring, or recording digital or analog displays.
. Avoid unnecessary rounding. Rounding often reduces measurement sensitivity. Averages should be calculated to at least one more decimal position than individual readings.
. If data occurs in time sequence, record the order of its capture.
. If an item characteristic changes over time, record the measurement or classification as soon as possible after its manufacture as well as after a stabilization period.
. To apply statistics which assume a normal population, determine whether the expected dispersion of data can be represented by at least 8 to 10 resolution increments. If not, the default statistic may be the count of observations which do or do not meet specification criteria.
.Screen or filter data to detect and remove data entry errors such as digital transposition and magnitude shifts due to a misplaced decimal point.
. Avoid removal by hunch. Use objective statistical tests to identify outliers.
. Each important classification identification should be recorded along with the data. This information can include: time, machine, auditor, operator, gage,lab, material, target, process change and conditions, etc.
It is important to select a sampling plan appropriate for the purpose of the use of the data. There are no standards as to which plan is to be used for data collection and analysis, therefore the analyst makes a decision based upon experience andthe specific needs. A few sampling methods are listed on the next page. There are many other sampling techniques that have been developed for specific needs.
Random Sampling
Sampling is often undertaken because of time and economic advantage. The use of a sampling plan requires randomness in sample selection. Obviously, true random sampling requires giving every part an equal chance of being selected for the sample.
The sample must be representative of the lot and not just the product that is easy to obtain. Thus, the selection of samples requires some up front thought and planning. Often, emphasis is placed on the mechanics of sampling plan usage and not on sample identification and selection. Sampling without randomness ruins the effectiveness of any plan. The product to be sampled may take many forms: in a layer, on a conveyor, in sequential order, etc.
The sampling sequence must be based on an independent random plan. The sample is determined by selecting an appropriate number from a hat or random number table.
Purpose of Sampling
Reasons to use sampling:
•It is often impractical or too costly to collect all the data
•Sometimes data collection is a destructive process
•Sound conclusions can often be made from a relatively small amount of data
Types and Ways of Sampling
Process Sampling: Helps understand nature and condition of the process
Population Sampling: Determines characteristics of the population
Methods of Sampling
Random Sampling : Each item has equal probability of being selected
Stratified Random Sampling : Population “stratified”into groups; random selection within each group
One of the basic assumptions made in sampling is that the sampleis randomly selected from a homogeneous lot. When sampling, the “lot” may not be homogeneous. For example, parts may have been produced on different lines, different machines, or under different conditions. One product line may have well maintained equipment, while another product line may be older orpoorly maintained equipment.
The concept behind stratified sampling is to attempt to select random samples from each group or process that is different from other similar groups or processes. The resulting mix of samples thus drawn can be biased if the proportion of the samples does not reflect the relative frequency of the groups. To the person using the sample data, the implication is that they must first be aware ofthe possibility of stratified groups and second, phrase the data report such that the observations are relevant only to the sample drawn and may not necessarily reflect the overall system.
Systematic Random Sampling : Every nth item in row
Subgroup Sampling : 3 Samples at this point each hour
Sequential Sampling
Sequential sampling plans are similar to multiple sampling plansexcept that sequential sampling can theoretically continue indefinitely. Usually, these plans are ended after the number inspected has exceeded three times the sample size of a corresponding single sampling plan. Sequential testing is used for costly or destructive testing with sample sizes of one and are based on a probability ratio test developed by Wald (1947).