Make a csv file with the specifications in the Access register
database and implement them to the raw data of the selected group of files
ie. (filgruppe). All files under the selected group will be affected
unless the KOBLID with argument koblid is specified or
select argument is used. Specifying koblid or select
is useful especially for testing purposes.
This function is the most used function in KHelse for processing
raw data. The function lag_fil() is an alias to make_file().
Usage
make_file(
group = NULL,
koblid = NULL,
aggregate = NULL,
save = FALSE,
year.geo = NULL,
implicitnull = NULL,
row = NULL,
base = NULL,
parallel = deprecated(),
raw = NULL,
select = NULL
)
lag_fil(
group = NULL,
koblid = NULL,
aggregate = NULL,
save = FALSE,
year.geo = NULL,
implicitnull = NULL,
row = NULL,
base = NULL,
parallel = deprecated(),
raw = NULL,
select = NULL
)
mf(
group = NULL,
koblid = NULL,
aggregate = NULL,
save = FALSE,
year.geo = NULL,
implicitnull = NULL,
row = NULL,
base = NULL,
parallel = deprecated(),
raw = NULL,
select = NULL
)Arguments
- group
The name of filegroup as specified in filgruppe
- koblid
KOBLIDfrom table tbl_Koble- aggregate
Logical value. Default is
TRUE. Aggregate data according to the specification in registration database. Global options withorgdata.aggregate.- save
Logical value. Default is
FALSE. To save as.csvformat file by activatingsave_file()function.- year.geo
Which reference year to use for geograhical coding. If it is missing then global option for
orgdata.yearwill be used.- implicitnull
Logical value. Default is
TRUEto add implicit null to the dataset. Global options withorgdata.implicit.null.- row
Select specific row(s) numbers only. Useful for debugging. Please read Debugging article for detail.
- base
Logical value. If
TRUEthen use year in the original data as the base year to recode the geographical codes. Default isFALSEand use all available codes in geo codebook- parallel
Logical or numeric value. With logical value
TRUEit will run with parallel using 50% ie. 0.5 of local cores. User can decide other percentage if needed. For example to use 75% of the cores then specify asparallel = 0.75. Nevertheless, maximum cores allowed is only 80%. Default value isFALSEie. to use sequential processing- raw
Logical value. Default is
FALSEas in config. IfTRUEthen read original raw data directly from source file even if the dataset is already available in DuckDB without the need to unmarkKONTROLLERTin the Access database- select
Select number of valid files to process as an alternative to using
KOBLID. To select the first 5 files then writeselect=1:5. Useselect="last"to select the last or most recent file.
See also
Other filegroups functions:
make_filegroups()
Examples
if (FALSE) { # \dontrun{
dt <- make_file("ENPERSON")
dt <- make_file("ENPERSON", raw = TRUE) #Skip DuckDB and read directly from original files
dt <- make_file("ENPERSON", koblid = 120:125) #Select specific files only
dt <- make_file("ENPERSON", select = "last") #Select most recent file
} # }
