Data processing

Analysis of dietary data is a two-step process involving:

  1. The measurement of food consumption using dietary assessment tools
  2. The conversion of reported dietary intakes into estimates of nutrient data

The conversion of reported dietary intakes into estimates first requires the coding of data. The accuracy of data obtained from studies and final estimates depend on the precision and completeness of the initial measurement of consumption, consistency and precision during coding, and the use of a representative and comprehensive food composition database. The following sections outline the processes involved in converting collected dietary data to estimates of food and nutrient intake.

Computer-based dietary assessment (e.g. INTAKE24 [2]) facilitates data entry and automatically processes dietary data. Underlying algorithms and related assumptions of the automated system are common with those needed for the use of a traditional dietary assessment.

The coding process used depends on dietary assessment methods:

  1. Coding of open-ended data (from weighed or estimated food diaries, or 24-hour recalls): Generally carried out using an electronic database which is searched by the coder to match food items reported with a food or food code on the system (e.g. food composition table) [3].
  2. Coding of dietary data with a fixed list of dietary items (from a food-frequency questionnaire): Generally, when the list of items is created during the design phase, every item on a list should be matched to items on the system. Thus, no effort is generally needed for a coder to match reported items with items on the system.
  3. If temporal and geographical information becomes available as common in diet records and recalls, such attributes can be coded.

In each case, a reported or fixed item can be matched to multiple food items on the system. For example, reported consumption of ‘pizza’ may be matched to:

  1. A single item of ‘pizza’ in a food composition table
  2. Three items of ‘pizza’ from three different brands, with 33% weight each
  3. Multiple items of flour, tomato, cheese, salt, and others on the basis of a recipe documented from a culinary society

This process requires assumptions of typical dietary consumption in a population and depends on the availability of items in a food composition table. Ideally, the matching procedure is developed with detailed observations of dietary consumption in a population (e.g. multiple-day weighed food diaries) and validated by objective methods (e.g. duplicate diet).

Potential sources of error

(i) Difficulties in interpreting written details in food diaries or 24-hour recalls, for example:

  1. Food diaries completed by young children (or even adults) - it may be difficult to read if the clarity of writing and spelling is poor.
  2. Information obtained during an interviewer-administered 24-hour recall may be poorly documented by the interviewer.

(ii) Coding which does not best match the food or beverage actually consumed, for example:

  1. Coding a pot of yoghurt as standard yoghurt when low-fat yoghurt was reported in the food diary.
  2. Coding a pot of yoghurt as standard yoghurt when low-fat yoghurt was actually consumed but reported without any details.
  3. Incorrect assumptions assigned to food items listed in a food-frequency questionnaire that could readily happen when a population used in the development of the questionnaire is different from the study population.

(iii) Human error; making mistakes during food coding, for example:

  • Number displacement. Mistakenly coding 200g of pasta rather than the quantity reported in the diary e.g. 100g.
  • Forgetting to code spread (e.g. butter or margarine) twice i.e. for each slice of bread when a sandwich is coded.
  • When it has been reported that half a cup of tea has been drunk, halving only the weight of tea consumed and not the weight of milk and sugar.

(iv) Personal characteristics of the food coder may also influence a coder’s judgement when interpreting a food record [1], for example:

  1. Age
  2. Nationality
  3. Familiarity with cooking
  4. Views on portion sizes
  5. Personal eating habits
  6. Degree of familiarity with the population being investigated

Standardisation of the coding process

The coding process involves numerous assumptions and judgements. In the process of coding open-ended dietary data, some degree of estimation and intuition is involved, irrespective of the skill and experience of the coder. Research groups have endeavoured to minimise the errors involved with coding by providing training for dietary coders and implementing quality control checks [15].

Attempts have been made to standardise the coding process [5]. For instance, a code book was designed for use in the INTERMAP study [3] which acted as a ‘rule book’. Its aim was to remove the need for coders to make subjective decisions.

The number of errors during coding may also be minimised if ‘coding rules’ are established to deal with incomplete or ambiguous entries in the food record [3]. For instance, the dietary coding manual (see Figure D.5.1) developed for use in the Infant Feeding Peer Support Trial provides standard infant portion sizes for commonly consumed food and drinks as well as listing default entries to aid coders in the coding process.

Figure D.5.1 Example of dietary coding manual page for baby foods from Infant Feeding Peer Support Trial (enlarge).
Source: Department of Public Health and Epidemiology, University College London.


Key considerations

Key considerations or guidelines for improving the coding process include the following: [1, 3]:

  1. Train a coder to get familiar with the types of foods and supplement products consumed by the study population. An example is use of a duplicate diet or direct observation to assess quantitative accuracy of food coding of interview-based 24-hour recall.
  2. Become accustomed to the workings of the dietary analysis programme and its food composition database.
  3. Asking respondents to keep the packaging of e.g. ready meals and unusual processed foods to return with the record can help with identification and coding.
  4. Adopt a standardised protocol for handling the coding of each new/substitute food item (‘food rules’) so that all coders deal with these items in a consistent manner.
  5. Develop a ‘code book’ or set of default rules to deal with missing information for certain types of foods and beverages e.g. for unknown types of cooking fat or an unknown type of milk added to a bowl of cereal.
  6. Implement quality control procedures such as routine spot-checks. It is good practice to check all entries keyed by new coders until the error rate drops to acceptable levels. Thereafter, a pre-agreed % error may be checked on a regular basis as a means of ongoing quality control.
  7. Independent duplicate coding by two coders.
  8. Validation study with sensitivity analyses pertaining to different assumptions and judgements.
  9. Keep the database up-to-date by updating nutritional composition data on foods already listed on the database and adding new foods.
  10. Keep an inventory of food composition information for foods that are not included in the food composition database. This may be particularly the case for niche foods such as sports products, low-fat varieties, or baby and toddler foods.
  11. Edit checks in software to prevent gross data entry errors so you can't enter 1000g of pasta instead of 100g.

Coding programmes for a food frequency questionnaire

Paper-based food frequency questionnaires are often designed to scan and encode participants' responses. Once dietary intake information has been formatted electronically, a computer program is operated to generate data on dietary intakes. A wide range of computer systems (of varying quality) is available to facilitate the processing.

Analysis software links consumption data to food composition data (e.g. that provided by McCance and Widdowson’s The Composition of Foods [7]).

Universities or research centres tend to develop their own software and databases which may contain specific information on particular nutrients or food constituents. Analysis programs have been specifically developed for use in large cohort studies. Examples include DINER (Data Into Nutrients for Epidemiological Research) [15], CAFE (Compositional Analyses from Frequency Estimates) [14] and FETA (FFQ EPIC Tool for Analysis) designed for use in the EPIC-Norfolk study into diet and cancer [11].

Commercially available coding programmes are also available, for example:

  1. CompEat
  2. Dietplan6
  3. Microdiet
  4. Nutmeg Menu Planner
  5. Saffron Nutrition
  6. WinDiets Research
  7. WISP

Such coding programmes can vary greatly in their design, utility, and target audience e.g. health professionals, catering establishments, sports industries, nutritionists and dieticians or for personal use.

Which coding programme to use?

As seen above, a large range of options is available. The choice of coding programme will depend on the population being studied and should therefore incorporate appropriate foods and portion sizes within its nutritional database. For example, if the nutritional intakes of infants and young children are being examined, foods normally consumed by these age groups should be available within the programme.

Ideally, analysis programmes should be flexible and easily updateable to add new variables and keep abreast of the changing food supply. Approximately 10,000 foods are estimated to be modified, newly generated, or discontinued each year in the UK [15].

Types of variables generated from consumption data

Depending on available data collected and research interests, diverse variables can be generated.

  1. An amount of consumption of nutrients, foods, and food groups
  2. Frequency of consumption of foods and food groups
  3. Frequency of dietary consumption in a certain dietary setting (e.g. breakfast)
  4. Measures of adherence to dietary patterns or diet quality
  5. Non-dietary indices based on properties of each food: glycaemic index, diet cost, a score indicating greenhouse gas emission levels associated with a habitual diet

A data processing algorithm can embed analysis of nutritional adequacy in a nutrient level or a whole-diet level from dietary data. Such an algorithm is useful when the purpose of dietary assessment is for screening of nutritional adequacy or when a researcher plans to provide timely feedback to participants.

Food composition databases

Food composition databases provide detailed information on the nutritional composition of foods. These databases may be:

  1. Paper or online document-based food composition tables e.g. McCance and Widdowson’s The Composition of Foods [7]
  2. Computer-based food composition databases such as The United States Department of Agriculture National Nutrient Database (USDA), or McCance and Widdowson’s composition of foods integrated dataset (CoFID)

Currently, there are over 150 food composition tables and electronic databases worldwide [6]. The LanguaL website [4], an international framework for food description, provides links to food composition databases from various countries.

Variation in food composition databases

Food composition databases vary greatly in terms of the number and detail of nutrients and other food chemicals or properties included. Please see the dedicated page for more detail on the potential sources of error associated with food composition databases.

  1. Woolhead C, Gibney MJ, Walsh MC, Brennan L, Gibney ER. A generic coding approach for the examination of meal patterns. Am J Clin Nutr. 2015 Aug;102(2):316-23. doi: 10.3945/ajcn.114.106112. Epub 2015 Jun 17. PMID: 26085514.
  2. Bradley J, Simpson E, Poliakov I, Matthews JN, Olivier P, Adamson AJ, et al. Comparison of INTAKE24 (an Online 24-h Dietary Recall Tool) with Interviewer-Led 24-h Recall in 11-24 Year-Old. Nutrients. 2016;8(6).
  3. Guan VX, Probst YC, Neale EP, Tapsell LC (2019) Evaluation of the dietary intake data coding process in a clinical setting: Implications for research practice. PLoS ONE 14(8): e0221047. https://doi.org/10.1371/journal.pone.0221047
  4. Danish Food Informatics. LanguaL - the International Framework for Food Description [cited 2016 1st December]. Available from: http://www.langual.org/Default.asp.
  5. Mulligan AA, Luben RN, Bhaniani A, et alA new tool for converting food frequency questionnaire data into nutrient and food group values: FETA research methods and availabilityBMJ Open 2014;4:e004503. doi: 10.1136/bmjopen-2013-004503
  6. European Food Information Resource. European Food Information Resource [cited 2016 1st December]. Available from: http://www.eurofir.org/.
  7. Food Standards Agency. McCance and Widdowson's the Composition of Foods. Seventh Summary ed. Cambridge: Royal Society of Chemistry; 2014.
  8. Gibson RS. Principles of Nutritional Assessment 2ed. Oxford: Oxford University Press; 2005.
  9. Greenfield H, Southgate DAT. Food Composition Data: Production, Management and Use (2nd edn.). London Elsevier; 2003.
  10. Ishihara J, Inoue M, Kobayashi M, Tanaka S, Yamamoto S, Iso H, et al. Impact of the revision of a nutrient database on the validity of a self-administered food frequency questionnaire (FFQ). J Epidemiol. 2006;16(3):107-16.
  11. Mulligan AA, Luben RN, Bhaniani A, Parry-Smith DJ, O'Connor L, Khawaja AP, et al. A new tool for converting food frequency questionnaire data into nutrient and food group values: FETA research methods and availability. BMJ Open. 2014;4(3).
  12. Prosky L. What is dietary fiber? J AOAC Int. 2000;83(4):985-7.
  13. Schakel SF, Dennis BH, Wold AC, Conway R, Zhao L, Okuda N, et al. Enhancing data on nutrient composition of foods eaten by participants in the INTERMAP study in China, Japan, the United Kingdom, and the United States. ‎J Food Comp Anal.2003;16(3):395-408.
  14. Welch AA, Luben R, Khaw KT, Bingham SA. The CAFE computer program for nutritional analysis of the EPIC-Norfolk food frequency questionnaire and identification of extreme nutrient values. J Hum Nutr Diet. 2005;18(2):99-116.
  15. Welch AA, McTaggart A, Mulligan AA, Luben R, Walker N, Khaw KT, et al. DINER (Data Into Nutrients for  Epidemiological Research) - a new data entry program for nutritional analysis in the EPIC-Norfolk cohort and the 7-day diary method. Public Health Nutr. 2001;4(6):1253-65.