This vignette is a short overview of inSilecoMisc organized around the following themes:
It is sometime useful to have a piece of random text to play with,
loremIpsum()
display a piece of the commonly used
placeholder text Lorem ipsum with
the option of having a specific number of words.
loremIpsum()
R> [1] "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\n quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\n consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\n cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non\n proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n\n Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius,\n turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis\n sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus\n et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut\n ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt\n sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum.\n Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget,\n consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl\n adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque\n nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis,\n laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu,\n feugiat in, orci. In hac habitasse platea dictumst.\n "
loremIpsum(10)
R> [1] "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do"
Assuming I I need the second word, all I have to do is:
and if I want a specific selection of words I cas use a sequence:
keepWords(c(loremIpsum(18), "be or not to be"), c(1:4, 12:16))
R> [1] "Lorem ipsum dolor sit tempor incididunt ut labore et"
R> [2] "be or not to NA NA NA NA NA"
As you may have notted, NA
s are added when the position
selected does not exist, this is useful but also annoying! Fortunately,
a agument allow to remove na
keepWords(c(loremIpsum(18), "be or not to be"), c(1:4, 12:16), na.rm = TRUE)
R> [1] "Lorem ipsum dolor sit tempor incididunt ut labore et"
R> [2] "be or not to"
Also collapse = "-"
allows the user to change the
character used to separate words:
keepWords(loremIpsum(18), c(1:6, 14:18), collapse = "-")
R> [1] "Lorem-ipsum-dolor-sit-amet-consectetur-ut-labore-et-dolore-magna"
and if collapse=NULL
then list will be returned
including a vector of the selected words per input string:
keepWords(c(loremIpsum(18), "be or not to be"), c(2:3), collapse = NULL)
R> [[1]]
R> [1] "ipsum" "dolor"
R>
R> [[2]]
R> [1] "or" "not"
Not that all punctuation signs will be removed (this can be changed
with argument split_words
)!
There are two other functions that work similarly:
keepLetters()
and keepInitials()
. The former
allows the used to select letters
keepLetters(loremIpsum(18), c(1:6, 14:18))
R> [1] "Loremiorsit"
keepLetters(loremIpsum(18), c(1:6, 14:18), collapse = "-")
R> [1] "L-o-r-e-m-i-o-r-s-i-t"
keepLetters(loremIpsum(18), c(1:6, 14:18), collapse = NULL)
R> [[1]]
R> [1] "L" "o" "r" "e" "m" "i" "o" "r" "s" "i" "t"
while the latter extracts initials
keepInitials("National Basketball Association")
R> [1] "NBA"
keepInitials("National Basketball Association", "-")
R> [1] "N"
Note that is you have a mixture of lower and upper case, so will the output
if this annoys you, base functions upper()
and
lower()
come in handy!
adjustStrings()
have 5 arguments to adjust strings of a
character vector is a very flexible fashion:
x
: the input character vector to be adjusted;n
: the number of characters to be added or used to
constrain on the length of the output strings;extra
: the character(s) to be added (0
is
the default value);align
: the string alignment (“right”, “left” or
“center”);add
: whether n
should be the constraint or
a number of characters to be added (a constraint by default).By default, adjustStrings()
uses n
as a
constrain for the length of the output strings. so if use
n = 4
instead of n = 2
in the first example,
all elements of the output vector will have 4 characters:
adjustStrings(1:10, n = 4)
R> [1] "0001" "0002" "0003" "0004" "0005" "0006" "0007" "0008" "0009" "0010"
Add I change the value of extra
to specify the
replacement character(s) to be used :
adjustStrings(1:10, n = 4, extra = 1)
R> [1] "1111" "1112" "1113" "1114" "1115" "1116" "1117" "1118" "1119" "1110"
adjustStrings(1:10, n = 4, extra = "a")
R> [1] "aaa1" "aaa2" "aaa3" "aaa4" "aaa5" "aaa6" "aaa7" "aaa8" "aaa9" "aa10"
adjustStrings(1:10, n = 4, extra = "-")
R> [1] "---1" "---2" "---3" "---4" "---5" "---6" "---7" "---8" "---9" "--10"
adjustStrings(1:10, n = 4, extra = "ab")
R> [1] "aba1" "aba2" "aba3" "aba4" "aba5" "aba6" "aba7" "aba8" "aba9" "ab10"
With align
, I can choose where extra characters are
added:
adjustStrings(1:10, n = 4, extra = "-", align = "right") # default
R> [1] "---1" "---2" "---3" "---4" "---5" "---6" "---7" "---8" "---9" "--10"
adjustStrings(1:10, n = 4, extra = "-", align = "left")
R> [1] "1---" "2---" "3---" "4---" "5---" "6---" "7---" "8---" "9---" "10--"
adjustStrings(1:10, n = 4, extra = "-", align = "center")
R> [1] "--1-" "--2-" "--3-" "--4-" "--5-" "--6-" "--7-" "--8-" "--9-" "-10-"
And if I want to add exactly n
extra characters,
add = TRUE
, then exactly n
extra characters
are added to strings ():
adjustStrings(1:10, n = 4, extra = "-", align = "right", add = TRUE)
R> [1] "----1" "----2" "----3" "----4" "----5" "----6" "----7" "----8"
R> [9] "----9" "----10"
adjustStrings(1:10, n = 4, extra = "-", align = "left", add = TRUE)
R> [1] "1----" "2----" "3----" "4----" "5----" "6----" "7----" "8----"
R> [9] "9----" "10----"
adjustStrings(1:10, n = 4, extra = "-", align = "center", add = TRUE)
R> [1] "--1--" "--2--" "--3--" "--4--" "--5--" "--6--" "--7--" "--8--"
R> [9] "--9--" "--10--"
Note that in this case, lengths of out strings differ! One last
remark about how adjustStrings()
works when
add = FALSE
: for a given string, there are 3 scenarios
:
n
;
in this case, the string is simply cut off:extra
’s character;
in this case, extra
is cut off:extra
is too short to adjust the string
according to n
, extra
is repeated:getDetails("path1/path2/foo.R")
R> Name Location Basename Extension Directory
R> 1 foo.R path1/path2 foo R FALSE
getDetails(list.files())
R> Name Location Basename Extension Directory
R> 1 overview.R . overview R FALSE
R> 2 overview.Rmd . overview Rmd FALSE
R> 3 table_odt.png . table_odt png FALSE
R> 4 tables_pdf.png . tables_pdf png FALSE
getExtension("foo.R")
R> [1] "R"
getBasename("foo.R")
R> [1] "foo"
(seqv <- stats::runif(40))
R> [1] 0.16598681 0.12547642 0.27992596 0.72399320 0.52586732 0.90113295
R> [7] 0.03015696 0.29393279 0.35718101 0.49142450 0.24995254 0.99526690
R> [13] 0.48904702 0.08145321 0.09291758 0.98350707 0.98658177 0.10866281
R> [19] 0.67279806 0.39162145 0.36948872 0.19751637 0.31270568 0.36768488
R> [25] 0.01417330 0.49949339 0.43118302 0.15286055 0.62851665 0.67001062
R> [31] 0.56486463 0.21142356 0.67153121 0.43431385 0.29165971 0.34591241
R> [37] 0.75374342 0.45294877 0.19913201 0.12779252
categorize(seqv, categ=seq(0.1,0.9, 0.1))
R> [1] 2 2 3 8 6 10 1 3 4 5 3 10 5 1 1 10 10 2 7 4 4 2 4 4 1
R> [26] 5 5 2 7 7 6 3 7 5 3 4 8 5 2 2
See also this blog post on inSileco.
dfA <- data.frame(col1 = c(1, 2), col2 = LETTERS[1:2])
dfB <- data.frame(col1 = 2, col4 = "cool")
dfTemplate(2, 2)
R> Var1 Var2
R> 1 NA NA
R> 2 NA NA
dfTemplate(2, 2, fill = 0)
R> Var1 Var2
R> 1 0 0
R> 2 0 0
dfTemplate(c("value", "name"), 2, col_classes = c("numeric", "character"))
R> value name
R> 1 NA <NA>
R> 2 NA <NA>
dfTemplateMatch(dfA, c("col4"))
R> col1 col2 col4
R> 1 1 A NA
R> 2 2 B NA
dfTemplateMatch(dfA, dfB)
R> col1 col2 col4
R> 1 1 A NA
R> 2 2 B NA
dfTemplateMatch(dfA, c("col1", "col4"), yonly = TRUE)
R> col1 col4
R> 1 1 NA
R> 2 2 NA
dfTemplateMatch(dfA, c("col1", "col2"), yonly = TRUE, col_classes = "numeric", fill = 0)
R> col1 col2
R> 1 1 A
R> 2 2 B
inSilecoMisc
includes four simple message functions to
standardize messages in scripts:
# 1. msgInfo() indicates what the upcoming computation
msgInfo("this is what's gonna happen next")
R> ℹ this is what's gonna happen next
# 2. msgWarning() reminds me something important that should not affect the run
msgWarning("Got to be careful")
R> ⚠ Got to be careful
# 3. msgError() when something went wrong (and I anticipated that it could happen)
msgError("Something wrong")
R> ✖ Something wrong
# 4. msgSuccess() when a step/ a computation has been successfully completed
msgSuccess("All good")
R> ✔ All good
They are meant to help structuring scripts, here is a somewhat contrived example:
scr_min <- function() {
# msgInfo() lets me know where I am in the script
msgInfo("Average random values")
set.seed(111)
out <- mean(runif(100))
msgSuccess("Done!")
# msgSuccess() indicates the successful completion of this part
out
}
scr_min()
R> ℹ Average random values
R> ✔ Done!
R> [1] 0.4895239
As these functions are based on message()
, one can
execute the script quietly by calling suppressMessages()
beforehand:
tblDown()
writes a data frame (and list of data frame)
in a document in various formats (.docx
by default)
tblDown()
handles lists of data frames and the user can
also pass a set of captions for every table and even separate them with
section headers:
tblDown(list(head(CO2), tail(CO2)), output_file = "tables.pdf",
caption = c("This is the head of CO2", "This is the tail of CO2"),
section = "Table")
Note that if there are less captions or sections titles than data frames, vectors of captions (and/or sections) are repeated and an index is appended.