orgdata 1.5.0
- Make HDIR-systems default, keep FHI as an option.
 - Some internal functions made more general to be used in other packages
 
orgdata 1.4.9
- 
geo_merge()gains alocaltableargument. It can now be used on mapping tables generated withgeo_map_multi. - Add 
geo_map_multi(), to generate a multiyear mapping table. 9b3e1a1 - Fix 
geo_merge()d163c9a - Temporarily made it possible to run on both FHI and HDIR systems.
 - Allow specification of encoding in 
read_fileba980a7 
orgdata 1.4.7
- Can use different encodings for reading files and Access database. Encoding can be specified in 
optionswith eitherorgdata.encoding.accessororgdata.encoding.csv. Check config file. - Fix geo splitting in 
geo_recode()usingfix=TRUEas default argument. - Change options name from 
debug.rowtodebug.rows. - Fix problem updating global options when config file has changed (#320)
 
orgdata 1.4.6
- Should not give unnecessary warning with 
dummy_grk(#318) - Rename argument 
yeartoyear.geoinmake_file()to be more explicit (#317) 
orgdata 1.4.5
- Have access to future data from API (#316).
 - Use 
pakpackage for upgrade. 
orgdata 1.4.1
- Refactor some codes.
 - Specify encoding via config.yml file.
 - Use 
norgeopackage version 2.3.0 from Github ie. dev, instead of CRAN. 
orgdata 1.3.0
- Fix selecting files without 
KOBLID(#315) - Add more testing.
 - Avoid warning when connection to the database is already closed.
 - Create environment for status flow ie. 
orgEnv. - Can use 
selectarg to ease choosing file(s) inmake_file()instead ofKOBLID. Read the document on how to use the argument. - Refactor some codes.
 - Edit some documents.
 - Fix 
dbDisconnecterror (#312) - Edit message (#313)
 - Edit few other messages as well.
 - Manage encoding warning for R 4.1 and below.
 - Filtering with 
IBRUKTILandIBRUKFRAfor selecting original files.IBRUKTILuses>=andIBRUKFRAuses<of the specified date. (#309) - Update vignette on debugging.
 - Utilise input from config when mutating columns #ref
 - Add welcome logo.
 - Use integer (#310)
 - Proper use of boolean (#311)
 
orgdata 1.2.0
- Update vignette.
 - Use keyword 
deleteorslettwhen recoding with RE options to"". This replacesemptyandtomas in #285. - Selecting geo levels when geo codes derived from two columns will use the maximum number of digits to find geo level (#307)
 - Rewrite some codes for speed. (#231)
 - Add unidentified municipalities as 
xx99togeo_map(). Unidentified municipalities do not exist from API. (#304) - Age group with 
AgeCatcan use mix categories easily with[x]. See example infind_age_category(). (#305) - Fix bugs when writing and reading data from warehouse ie. marked the 
KONTROLLERTcolumn in Access (#298) - Fix bugs when accessing or deleting data with 
see_data()from data warehouse (#300) - Upgrade package version dependency.
 - Create unknown municipality codes with 
xx99ingeo_recode()function ie. geo record table in Access. (#302) - Recode unknown municipality codes with 
xx99for known county. (#303) - MANHEADER accepts regular expression to select columname in addition to column index (#301)
 - Function 
read_file()accept SPSS file too (#297) - Function 
read_file()now accept txt file extension. 
orgdata 1.1.0
- Make geo abbreviation clearer in the global options and no more depending on the order of geo names (#286)
 
orgdata 1.0
Order does matter when using functions in EXTRA column (#294)
Fix S3 age category functions
find_age_category()(#294)Delete unwanted column in the output when using
AgeCatfunction in EXTRA (#295)Use global options in input argument when relevant instead of using
match.arg()(#263)Edit document here and there.
Use codebook to recode age category for better speed (#292)
Read or delete data in the data warehouse can use multiple koblid in
see_data(). Use"all"inkoblidargument to select all data on the chosen filegroup (#291)Fixed duplicated age groups (#290).
Use either
emptyortomto represent regular expression to replace to since Access makes symbol""to be invisible (#285)Change function name from
see_org()tosee_data()for viewing data in the data warehouse.Use symbol
|to separate multiple arguments in columnEXTRA(#288)Group age to specific or specified interval with
AgeCat(). This function can be use in table for filegroup underEXTRAcolumn (#287 #289)Delete raw of similar columns with multiple specifications (#282)
Different ways to recode of similar column ie. duplicated, with defined lesid will give error.
Show current installed version against the new release version.
Some minor text editing.
Delete older file of DuckDB automatically since new version can’t read the older files (#280)
Actively ask users to update with given options when new release version is available.
Control all columns that should be numeric don’t contain any string. The columns to be controls for is now dynamic (#281)
Rename function alias from
rf()tordf()sincerf()is already in use in stats package.Deleting row with specification of lesid and without lesid ie. common filegroup, create errors. This is not fixed (#279)
Need to update version whenever a new release is available to ensure everyone is using the latest release. Else users can’t load the package.
Filter active files with date was done pragmatically ie. all files with
BRUKTILdate other than01-01-9999will be excluded. Now filtering with date will be compared against current date. (#272)Delete dataset from the database when unmark column
KONTROLLERTinstead of updating the dataset due to time consuming by updating it. The users have to mark the column to save or read the dataset in the database. (#278)Change default for argument
rawinmake_file()toFALSEas in config file.Some text editing here and there.
Function
is_colour_txt()can specify symbol directly without needing to rely on the global options or to usewithrpackage. Just for cosmetic purposes 😃Standard columns should be uppercase. Using lowercase creates error when GEO comes from two separated columns especially when handling recode from codebook (#277)
Standardize arguments names in
see_org()function.Parallel processing is deactivated since it doesn’t work smoothly with Access and DuckDB connection ie. DBI package, due to the problem with not-exportable objects for future package.
Replace SQLite with DuckDB again due to the speed for DuckDB. But since DuckDB is under active development, the new version might not work with the file created by older version. In such situation older DuckDB database should be deleted and re-run with the newer DuckDB version.
When loading the package, users will be reminded whenever new version is available.
Show as data frame when warning
NAs with coercioninstead of just the GEO number where the coercion took place (#274)Replace DuckDB with SQLite (#271)
Columnames are case insensitive (#115)
Use ellipsis for other arguments of
data.table::fwrite()insave_file()function.Implicit null for number of geo digits is moved to config file. This make it easy to maintain and expand.
Use
future.applypackage conditionally to reduce package vulnerability.Deactivate dependency packages for parallel processing. Use it conditionally ie. to install manually if needed.
Reshape multiple columns containing more than one
VALis handled properly. When reshaping multiple columns to be just oneVALthen leavingRESHAPE_VALempty ie. use all columns not defined inRESHAPE_IDshould work as before (#269)Give more information when error to reshape data, especially long reshape. Most of the time the source of error is in defining
RESHAPE_KOLandRESHAPE_VAL(#268)Can reshape multiple columns when reshaping to LONG. Applicable when there are multiple columns represent number of cases and one column represent denominator for all these numerators (#266)
Use KOBLID to add or delete from Duck database (#261)
Depends for norgeo package refers directly to CRAN version instead of GitHub repo.
Use function
geo_merge()to add geo granularity that aren’t from API. This can be a csv, xls or any other format that is accepted byread_file(#262)Deactive KONTROLLERT without the need to umark from Access database by using argument
raw = TRUEwhen using functionmake_file()(#264)Use KOBLID as table name in DuckDB instead of FILID since FILID is not a unique number ie. can be used by multiple KOBLID.
Update text and website
Change function name from
geo_levels()togeo_map()for mapping geo codes granularity.Change helper function names for
read_file(). (#250)Fixed bugs when reading file from the web with https (#251)
read_file()accept Stata file withdtaextension (#252)Use yaml format for global configuration file to ease update. The file is located in config repo (#256)
Raw data that have been controlled for are saved in DuckDB database format. This increase reading speed especially for big files. Mark column KONTROLLERT in the Access registration database to activate this function. Unmark to read from the original raw file instead. (#257)
Use
see_org()to read the raw data in the database. Argumentaction = "delete"can be used to delete the data from the raw database.Function
geo_merge()for merging geo codes that aren’t available from API to the mapping table ie. tblGeo, in the geo database. The data could be in any file format accepted byread_file()function. The data to be merged must have column to be merged ie.id.file, that is equivalent to the column id in the database ie.id.table. Theid.filemust be unique.Handle unbalanced parentheses in post recode whenever possible when
rawis used, else give error message. #246read_file()accept filegroup name as argument infileto read the completed file after runningmake_file()function. #247Debug functions can go deeper to show helper functions as well with
debug_opt("deep")oroptions(orgdata.debug = "deep")#243Post recode uses type
PSin codebook. The function is used when there is a need to recode the value of a column after the dataset have been clean and aggregated. Specification to select the row to be recoded uses either standard expression or R syntax ofdata.tablestyle. When using R syntax the value must haverawprefix eg.raw(AAR %in% c(2000, 2005)). #244 #245Delete rows with
"-"minus symbol inTILcolumn in the codebook is accepted fordo_recode_post().Able to aggregate to country level in addition to other geographical levels (#240)
Download the geo code to aggregate with only once despite the different geographical levels the dataset to be aggregated into (#241)
Update package via function
update_orgdata(). Basically it’s just a wrapper forremotes::install_github().Can aggregate to country level as well (#240)
Speed up aggregate process (#241)
Fixed the bugs with reshape wide consisting multiple dimensions ie. more than one TAB columns. (#228)
Ensure melting columns are not converted to factor (#234)
Some columns should be numeric type and these columns are GEO, AAR, KJONN, ALDER and all VALs. They are also specify in global options
orgdata.num. Ensure thise columns are numeric and give warning as well as log when coercion where NA is introduced. (#235)Use log file to check what or which codes that have problem. (#237)
Warn and give logfile if GEO can’t be converted to integer before geo recode. (#233 #236)
Able to read files that have no specific extension but they have to be coma or semicolon separated (#227)
Ensure all VAL columns are numeric type (#229)
Give explicit warning when GEO codes have character that can cause coercion resulting in converting GEO codes to
NA(#229)Log files for
code99also include koblid. The files will be named ascode99_koblidxxx(#222)Running parallel processing is unstable. Not sure if the process crash due to MS Access that can’t handle constant requests or … Anyway, running parallel now will only use 50% of the available cores (#224)
User can specify more or less cores than 50% if needed. To use 75% cores by specifying
parallel = 0.75in the argmentmake_file()or in the global optionsorgdata.parallel(#225)Connect to both database with common function. (#212)
Use global options
orgdata.yearto specify production year if not using current year. (#216)Use parallel processing in
make_file()with argumentparallel = TRUE. (#217)Get feedback for successful and unsuccessful filegroups when running
make_filegroups(). (#199)Reshape wide is not limited to only 3 reshape columns as it was. (#200 #201)
Create log files in default orgdata folder instead of occupying the REM. (#202)
Delete old bydel codes ie. before 2003, except for Oslo. This is due to the unstable geographical recoding of bydel. Use
EXTRAcolumn on filegroup level with argumentDeleteOldBydel. (#204 #206)Log files use
KOBLIDto be more specific. (#208)Reshape to wide format can now implement multiple reshape dimensions which are usually one or multiple TAB columns. For instance for LESEFERD of
tot_elevis the product of bothmestringsnivåandklassetrinn. (#188)The output will be arrange on a standard column order. (#190)
Debugging will be easier with a wrapper function
debug_opt(). (#196)Fixed #147 to implement reshape wide format to create denominator. This function is only applicable to certain file structure received from SSB. For example filegroup LESEFERD where column
tot_elevrepresents the total number of student withmestringsnivåand not the grand total of students. The number of students withmestringsnivåis represented in columnant_elev. Therefore the long format formestringsnivåneeds to be restructured to wide with value from theant_elevto ensure summing uptot_elevwhen creating denominator will not create a grand total of students instead of the total number of students withmestringsnivå. (#184)Fixed #162 when saving geographical codebook with
write = TRUEand the table doesn’t exist in the geo-code database.Stop looping the geographical levels when
orgdata.debug.geoororgdata.debug.aggregateare active and make the default tokommune. (#166)Add batch date from codebook as reference when debugging. (#168)
Hide warning messages if file has been controlled for any possible errors. It means the column KONTROLLERT in table for original files is marked for the specific file. (#171)
Refactoring function for recode of geographical codes. (#174)
Geo codes for enumeration areas and towns that aren’t able to be merged will be converted to
unknowncodes with eitherxxxx9999orxxxx99. (#177)Create default folder at
C:/Users/YourUserName/orgdata_logswhenpathargument is not specified insave_file(). (#179)Create unknown codes for enumeration areas and town codes if not able to recode. Most probably is due to the enumeration codes before 2002. The Excel file from SSB is not clean and unstable to be used for recoding. The unknown codes are based on municipality codes with added
xxxx99orxxxx9999. As in #177 but recode is done on municipality codes before merging back to the original dataset. (#182)Add
codeDeleteinlogfor geographical codes that aren’t able to be merged. The codes will be excluded in the dataset. To access all the deleted codes uselog$codeDelete. (#149)Give explicit error message when
pathis missing insave_file(). (#152)Able to select any valid year to recode geographical codes and aggregate accordingly. But recoding geographical codes backward isn’t possible. (#153)
Error message will split a long vector if exist. (#153)
Add column with batch date when the geographical codes were downloaded form SSB when creating a geo codebook with
geo_map()orgeo_recode(). (#156)Actively select only the first geo code when old geo code is split into multiple new geo codes. (#159)
- 
Able to select a base year for geo recode based on the year available in the original data. This is available in column
AARin the dataset. Use argumentbaseor global optionorgdata.recode.basewith logical input.TRUEwill select the base year for recoding geographical code from the year of the original file to the current year. Default isFALSEie. include all available geographical codes available in the codebook. (#157) Fixed #139 for
orgdata.debug.geokeep original geo codes for enumeration areas before adding 9999. (#140)Fixed #142 show codes that have problem to recode directly instead of row numbers (#144)
Save all codes that have problem in
logenvironment for easy access. To list the codes is either withlog$code00orlog$codeShortRecode geo even when argument
aggregate = FALSEinmake_file()function.Rename
make_filegroupandlag_filgruppeto plural ie.make_filegroupsandlag_filgrupper.Use options
orgdata.debug.rowsto select only specific row(s) for further processing. It can be activated via global options withoptions(orgdata.debug.rows = 20:50)or via argumentrow = 20:50inmake_file()to select row 20 to 50.Make multiple filegroups via
make_filegroups. (#137)Fixed #132 LANDSSB must be string
Convert whitespace to NA to be able to delete all rows with NA
Fixed #118 warning text when column(s) aren’t defined in FILGRUPPE and will be deleted (#128)
Edit error message for columns with existing NA value before aggregating. Total value will be NA and this will conflict with the allready existing NA category in the aggregated column(s). Therefore existing NA value in the selected column(s) must be recoded to a valid value.
Use
options(orgdata.debug.geo = TRUE)to keep old geo codes for debuging (#120)Use
reset_options()to reset to default options.Warn when process discontinued due to debugging.
Add vignettes for Standardize git and Debugging
Change database filename to raw-database_BE.accdb
Fixed #108 #112 grunnkrets codes that have changed before 2002 not available via API from SSB while code changes for municipality includes changes from 1977. Check from SSB website. We use the municipality codes to create uspesified grunnkrets codes for data before 2002 (#109 #113)
Fixed #110 updating SQL code for new table name for codebook (#111)
Check columns to aggregate for any possible
NA(#98). Columns that haveNAshould be recoded touoppgittor something equivalent since leaving the category toNAwill conflict withNArepresenting total value when aggregating.Fixed #100 for grunnkrets that ends with
00have no correspond codes from SSB API. Need to add it manually (#101)Fixed #99 when geo codes fails to be recoded then the row index will be shown (#103)
Geo codes ends with 4 zeros
xxxx0000neither have equivalent codes from SSB nor representing a correct coding structure as so calledDelområdethat ends with 2 zerosxxxxxx00. To avoid missing the information, these geo codes are recoded toxxxx9999with functionis_grunnkrets_0000()as in PR (#103).see_file()accepts just a single numeric as well.Fixed #85
see_file()list all the columns when columnames or column indexes are not specified. The variables are sorted whenever possible. (#87)Add more function tests (#88)
Exclude
LANDSSBin aggregate when split toLANDBAKandINNVKAT. This is because code0will be recoded to20when split and causes unnecessary more rows (#84)Delete deprecated functions.
Fixed #93 when source level can’t be identified due to
NA.Fixed #95 for grunnkrets codes that aren’t missing but have less number of digits ie. less than 7 digits. Assuming these are codes for municipality then
9999is added at the end of these codes (#96)Gives row number for GEO codes that get coerced as
NAwhen converted to integer. This will make it easy to check in the original raw data (#96)Aggregate now give total to all dimensions including those specified in
AGGKOL(#82)Function
see_file()accept column index as well (#83)Recode variables using regular expression when defined in codebook with type
RE. Finding pattern can either be written in ordinary regular expression ie.\\d{4}.*or withrex()package. (#78)New feature for checking categories for variables with
see_file()(#75)Fixed #65 make TABS and VALS dynamic for easy extension for these columns (#66)
Fixed #63 implicit null includes all possible VAL columns when exist (#69)
Fixed #67 aggregate with total values for standard variables ie.
UTDANN,LANDSSB,LANDBAKandINNVKAT(#72)Fixed #61 use AGGKOL in Access registration database to specify other columns to aggregate other than the standard eg.
KJONN,TAB1,TAB2etc. (#73)Fixed #55 to recode standard variables via codebook instead of hard coded (#58)
Fixed #57 split column with duplicated values will keep the original column (#60)
Fixed #56 aggregate all VAL columns whenever specified and not only specific to
VAL1(#62)Edit verbose messages
Reshape dataset from wide to long. Reshape can have more than one
measure variables. Please read how this is specified in Access registration database.Split columns must have equal number of values to the defined
SPLITTIL. Duplicate the value if it is less than the maximumSPLITTIL. For example for value0in columnLANDSSBwhich will be split intoLANDBAKandINNVKAT, the value will be duplicated into00to avoid split with valueNA.Recode for
LANDBAKandINNVKATafter aggregating are done internally ie. hard coded, indo_aggregate_recode_standard(). Total is coded with20. Any eventuality for future change should also look other related functions such asis_aggregate_standard_cols()andis_col_num().Change argument parameter for
find_spec()function.Update text document in several places.
Add colour type warn2 for warning message without
Warning:prefix.Request (#43) messages with specific colour
Fixed (#46) recode to string even though columns is type integer or numeric.
Unknown bydel ie. (uoppgitt) is added when enumeration areas codes ie. (grunnkrets) for bydel is
XXXX9999in functiongeo_level().Add unknown grunnkrets for kommune when not available since some of the datasets have unknown grunnkrets that aren’t listed in API downloaded data (#39).
Exclude
TAB1,TAB2andTAB3from being aggregated. (#44)Recode for aggregated variables uses
AGin TYPE column in the codebook instead of FILGRUPPE withAGGREGATEas it was implemented in ver 0.2.0. This will make it possible so specify FILGRUPPE and LESID to implement the principle for GENERAL, COMMON and SPECIFIC variables.Change function name
do_aggregate_recodetodo_aggregate_recode_standardfor standard variables.Recode for aggregated categories can be defined in Recode form ie. codebook, and use
AGGREGATEin the specification under FILGRUPPEDelete rows when defined in codebook using minus symbol under TIL column. Similar principles is implemented for GENERAL, COMMON and SPECIFIC feature as in recode. Read detail in ver 0.0.5 - alpha.
Display both columnames to be recoded that are found in the dataset or those that aren’t found when defined as
ALLEin the codebook so user will be aware of its existence.Standardize some most used arguments to
read_file()such asnrows,header,skip,trimwsandna. Read details inread_file()function description.Output to
read_file()as data.table class.Use standard columnames with
V1,V2etc when argmentheader = FALSEis specified.Error message with list of unmatch columns in
do_column_standard().Give clearer message and debug message eg.
Execute: read_file().Change
MAPPEtoUTMAPPEto make it more explicit for path specification to save file.Defun
orgdata.activeglobal options to use columnames from original dataset.Use global options
options(orgdata.debug.nrow = TRUE)to read only first 20 rows. Suitable for debug purposes.Fix (#28) GEO derived from two columns with empty INNLESARG.
Add column
LEVELfor granularity level ie. grunnkrets, fylke, kommune, bydel etc
orgdata 0.0.1 - alpha version
When MANHEADER is used then the new columname must be specified in the respective standard column (#21)
Deprecated arguments
geoandvalinmake_file. Output data must use standard columnames instead of keeping the columnames from original dataset.Rename functions
read_raworlesrawtomake_fileorlag_fil(#27)Alle functions uses underscore “_” for both english and norwegian.
Add new columns if one of the standard columns is missing in the original data. The value to be inserted to the new column must use symbol less than
<and more than>. For instance when columnKJONNdoesn’t exist in the original data, we can specify with<2>in under columnKJONNin the Access registration database. The output will add a new columnKJONNwith value2. (#15)Default
orgdata.verboseisTRUE.Options for
orgdata.implicit.nullwith default asTRUE. Useoptions(orgdata.implicit.null = FALSE)to deactivate (#19)Use version specific for imported packages.
Rename standard column
LANDBAKtoLANDSSBfor column in original data received from SOB containing information about country of origin.Save file as specified in column
MAPPEin Access registration database or specify inpathargument for functionsave_file. (#12)Changes is in PR #8 is as the following:
Create GEO code from two separate columns. This has to be defined in Access registration under
GEOwith comma separated eg.nameGeoCol1, nameGeoCol2.- 
Order standard columns in the output dataset with this order for the first four columns:
- 
GEO,AAR,ALDER,KJONN 
 - 
 Change norwegian name for
save_filefromlagfiltolagrefil.Use column name
KOLNAVNinstead ofADDKOL.Rename function
do_addcolsandget_addcolstodo_colnameandget_colnameto be consistent with the changes in Access registration database.Changes is in PR #4 is as the following:
- 
Recode variables from specification in
tbl_KodeBokuses:- 
GENERAL variables are defined in FILGRUPPE as 
ALLEand are used to recode variables in all groups. - COMMON variables are when FILGRUPPE is specified but have empty LESID. This will recode variables within selected group.
 - SPECIFIC variables are when FILGRUPPE and LESID are specified. This will recode variables in that specified FILGRUPPE of the specified FILID.
 
 - 
GENERAL variables are defined in FILGRUPPE as 
 - 
When all these three specification exist in
tbl_KodeBok:- SPECIFIC variables will overrule COMMON variables
 - COMMON variables will overrule GENERAL variables
 
 Write as
<NA>in codebook under columnFRAwhen specifying missing variables indicating that a missing column to be recoded to value in columnTIL. This will differentiate between real missing and a real column value ofNA. (#5)Error message will be given if LESID is specified without FILGRUPPE since LESID is not unique ID.
orgdata 0.0.4 - alpha
- MANHEADER and KOLNAVN uses common helper function 
is_col_separate() - MANHEADER 
oldconvert to integer and use and index for columns 
Changes is in PR #2
