Skip to contents

Make a csv file with the specifications in the Access register database and implement them to the raw data of the selected group of files ie. (filgruppe). All files under the selected group will be affected unless the KOBLID with argument koblid is specified or select argument is used. Specifying koblid or select is useful especially for testing purposes.

This function is the most used function in KHelse for processing raw data. The function lag_fil() is an alias to make_file().

Usage

make_file(
  group = NULL,
  koblid = NULL,
  aggregate = NULL,
  save = FALSE,
  year.geo = NULL,
  implicitnull = NULL,
  row = NULL,
  base = NULL,
  parallel = deprecated(),
  raw = NULL,
  select = NULL
)

lag_fil(
  group = NULL,
  koblid = NULL,
  aggregate = NULL,
  save = FALSE,
  year.geo = NULL,
  implicitnull = NULL,
  row = NULL,
  base = NULL,
  parallel = deprecated(),
  raw = NULL,
  select = NULL
)

mf(
  group = NULL,
  koblid = NULL,
  aggregate = NULL,
  save = FALSE,
  year.geo = NULL,
  implicitnull = NULL,
  row = NULL,
  base = NULL,
  parallel = deprecated(),
  raw = NULL,
  select = NULL
)

Arguments

group

The name of filegroup as specified in filgruppe

koblid

KOBLID from table tbl_Koble

aggregate

Logical value. Default is TRUE. Aggregate data according to the specification in registration database. Global options with orgdata.aggregate.

save

Logical value. Default is FALSE. To save as .csv format file by activating save_file() function.

year.geo

Which reference year to use for geograhical coding. If it is missing then global option for orgdata.year will be used.

implicitnull

Logical value. Default is TRUE to add implicit null to the dataset. Global options with orgdata.implicit.null.

row

Select specific row(s) numbers only. Useful for debugging. Please read Debugging article for detail.

base

Logical value. If TRUE then use year in the original data as the base year to recode the geographical codes. Default is FALSE and use all available codes in geo codebook

parallel

Logical or numeric value. With logical value TRUE it will run with parallel using 50% ie. 0.5 of local cores. User can decide other percentage if needed. For example to use 75% of the cores then specify as parallel = 0.75. Nevertheless, maximum cores allowed is only 80%. Default value is FALSE ie. to use sequential processing

raw

Logical value. Default is FALSE as in config. If TRUE then read original raw data directly from source file even if the dataset is already available in DuckDB without the need to unmark KONTROLLERT in the Access database

select

Select number of valid files to process as an alternative to using KOBLID. To select the first 5 files then write select=1:5. Use select="last" to select the last or most recent file.

See also

Other filegroups functions: make_filegroups()

Examples

if (FALSE) { # \dontrun{
dt <- make_file("ENPERSON")
dt <- make_file("ENPERSON", raw = TRUE) #Skip DuckDB and read directly from original files
dt <- make_file("ENPERSON", koblid = 120:125) #Select specific files only
dt <- make_file("ENPERSON", select = "last") #Select most recent file
} # }