inSilecoMisc: an overview

This vignette is a short overview of inSilecoMisc organized around the following themes:

  1. strings manipulations
  2. vector manipulations
  3. data frames manipulations
  4. Mathematical functions

String manipulations

LoremIpsum

It is sometime useful to have a piece of random text to play with, loremIpsum() display a piece of the commonly used placeholder text Lorem ipsum with the option of having a specific number of words.

loremIpsum()
R>  [1] "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod\n  tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\n  quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\n  consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\n  cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non\n  proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n\n  Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius,\n  turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis\n  sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus\n  et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut\n  ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt\n  sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum.\n  Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget,\n  consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl\n  adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque\n  nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis,\n  laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu,\n  feugiat in, orci. In hac habitasse platea dictumst.\n  "
loremIpsum(10)
R>  [1] "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do"

Keep a selection of words or letters

Assuming I I need the second word, all I have to do is:

keepWords(c(loremIpsum(18), "be or not to be"), 2)
R>  [1] "ipsum" "or"

and if I want a specific selection of words I cas use a sequence:

keepWords(c(loremIpsum(18), "be or not to be"), c(1:4, 12:16))
R>  [1] "Lorem ipsum dolor sit tempor incididunt ut labore et"
R>  [2] "be or not to NA NA NA NA NA"

As you may have notted, NAs are added when the position selected does not exist, this is useful but also annoying! Fortunately, a agument allow to remove na

keepWords(c(loremIpsum(18), "be or not to be"), c(1:4, 12:16), na.rm = TRUE)
R>  [1] "Lorem ipsum dolor sit tempor incididunt ut labore et"
R>  [2] "be or not to"

Also collapse = "-" allows the user to change the character used to separate words:

keepWords(loremIpsum(18), c(1:6, 14:18), collapse = "-")
R>  [1] "Lorem-ipsum-dolor-sit-amet-consectetur-ut-labore-et-dolore-magna"

and if collapse=NULL then list will be returned including a vector of the selected words per input string:

keepWords(c(loremIpsum(18), "be or not to be"), c(2:3), collapse = NULL)
R>  [[1]]
R>  [1] "ipsum" "dolor"
R>  
R>  [[2]]
R>  [1] "or"  "not"

Not that all punctuation signs will be removed (this can be changed with argument split_words)!

There are two other functions that work similarly: keepLetters() and keepInitials(). The former allows the used to select letters

keepLetters(loremIpsum(18), c(1:6, 14:18))
R>  [1] "Loremiorsit"
keepLetters(loremIpsum(18), c(1:6, 14:18), collapse = "-")
R>  [1] "L-o-r-e-m-i-o-r-s-i-t"
keepLetters(loremIpsum(18), c(1:6, 14:18), collapse = NULL)
R>  [[1]]
R>   [1] "L" "o" "r" "e" "m" "i" "o" "r" "s" "i" "t"

while the latter extracts initials

keepInitials("National Basketball Association")
R>  [1] "NBA"
keepInitials("National Basketball Association", "-")
R>  [1] "N"

Note that is you have a mixture of lower and upper case, so will the output

keepInitials("National basketball association")
R>  [1] "Nba"

if this annoys you, base functions upper() and lower() come in handy!

keepInitials(tolower("National basketball association"))
R>  [1] "nba"
keepInitials(toupper("National basketball association"))
R>  [1] "NBA"

Adjust the size of a character string

adjustStrings() have 5 arguments to adjust strings of a character vector is a very flexible fashion:

  1. x: the input character vector to be adjusted;
  2. n: the number of characters to be added or used to constrain on the length of the output strings;
  3. extra: the character(s) to be added (0 is the default value);
  4. align: the string alignment (“right”, “left” or “center”);
  5. add: whether n should be the constraint or a number of characters to be added (a constraint by default).

By default, adjustStrings() uses n as a constrain for the length of the output strings. so if use n = 4 instead of n = 2 in the first example, all elements of the output vector will have 4 characters:

adjustStrings(1:10, n = 4)
R>   [1] "0001" "0002" "0003" "0004" "0005" "0006" "0007" "0008" "0009" "0010"

Add I change the value of extra to specify the replacement character(s) to be used :

adjustStrings(1:10, n = 4, extra = 1)
R>   [1] "1111" "1112" "1113" "1114" "1115" "1116" "1117" "1118" "1119" "1110"
adjustStrings(1:10, n = 4, extra = "a")
R>   [1] "aaa1" "aaa2" "aaa3" "aaa4" "aaa5" "aaa6" "aaa7" "aaa8" "aaa9" "aa10"
adjustStrings(1:10, n = 4, extra = "-")
R>   [1] "---1" "---2" "---3" "---4" "---5" "---6" "---7" "---8" "---9" "--10"
adjustStrings(1:10, n = 4, extra = "ab")
R>   [1] "aba1" "aba2" "aba3" "aba4" "aba5" "aba6" "aba7" "aba8" "aba9" "ab10"

With align, I can choose where extra characters are added:

adjustStrings(1:10, n = 4, extra = "-", align = "right") # default
R>   [1] "---1" "---2" "---3" "---4" "---5" "---6" "---7" "---8" "---9" "--10"
adjustStrings(1:10, n = 4, extra = "-", align = "left")
R>   [1] "1---" "2---" "3---" "4---" "5---" "6---" "7---" "8---" "9---" "10--"
adjustStrings(1:10, n = 4, extra = "-", align = "center")
R>   [1] "--1-" "--2-" "--3-" "--4-" "--5-" "--6-" "--7-" "--8-" "--9-" "-10-"

And if I want to add exactly n extra characters, add = TRUE, then exactly n extra characters are added to strings ():

adjustStrings(1:10, n = 4, extra = "-", align = "right", add = TRUE)
R>   [1] "----1"  "----2"  "----3"  "----4"  "----5"  "----6"  "----7"  "----8" 
R>   [9] "----9"  "----10"
adjustStrings(1:10, n = 4, extra = "-", align = "left", add = TRUE)
R>   [1] "1----"  "2----"  "3----"  "4----"  "5----"  "6----"  "7----"  "8----" 
R>   [9] "9----"  "10----"
adjustStrings(1:10, n = 4, extra = "-", align = "center", add = TRUE)
R>   [1] "--1--"  "--2--"  "--3--"  "--4--"  "--5--"  "--6--"  "--7--"  "--8--" 
R>   [9] "--9--"  "--10--"

Note that in this case, lengths of out strings differ! One last remark about how adjustStrings() works when add = FALSE: for a given string, there are 3 scenarios :

  1. the string to be adjusted has more characters than n; in this case, the string is simply cut off:
adjustStrings("ABCD", n = 2, extra = "efgh")
R>  [1] "AB"
  1. the string has more character but the number of character for the adjustment is smaller than the number of extra’s character; in this case, extra is cut off:
adjustStrings("ABCD", n = 6, extra = "efgh")
R>  [1] "efABCD"
  1. finally, when extra is too short to adjust the string according to n, extra is repeated:
adjustStrings("ABCD", n = 14, extra = "efgh")
R>  [1] "efghefghefABCD"

Extract file info

getDetails("path1/path2/foo.R")
R>     Name    Location Basename Extension Directory
R>  1 foo.R path1/path2      foo         R     FALSE
getDetails(list.files())
R>              Name Location   Basename Extension Directory
R>  1     overview.R        .   overview         R     FALSE
R>  2   overview.Rmd        .   overview       Rmd     FALSE
R>  3  table_odt.png        .  table_odt       png     FALSE
R>  4 tables_pdf.png        . tables_pdf       png     FALSE
getExtension("foo.R")
R>  [1] "R"
getBasename("foo.R")
R>  [1] "foo"

Assign a symbol to a p-value

sapply(c(.2, .08, .04, .008, 0.0001), signifSymbols)
R>  [1] "n.s." "."    "*"    "**"   "***"

applyString

applyString("cool", FUN = toupper, pos = 1:2)
R>  [1] "COol"
applyString(c("cool", "pro"), pattern = "o", FUN = toupper)
R>  [1] "cOOl" "prO"

Extract digits from strings

getDigits(c("a1", "032hdje2832"))
R>  [[1]]
R>  [1] "1"
R>  
R>  [[2]]
R>  [1] "032"  "2832"

Collapse elements of a vector and add element separators

commaAnd(c("Judith", "Peter", "Rebecca", "Eric"))
R>  [1] "Judith, Peter, Rebecca and Eric"

Vectors manipulations

whichIs

vec <- LETTERS[1:7]
spl <- sample(vec)
whichIs(vec, spl)
R>  [[1]]
R>  [1] 7
R>  
R>  [[2]]
R>  [1] 4
R>  
R>  [[3]]
R>  [1] 5
R>  
R>  [[4]]
R>  [1] 1
R>  
R>  [[5]]
R>  [1] 2
R>  
R>  [[6]]
R>  [1] 3
R>  
R>  [[7]]
R>  [1] 6
id <- unlist(whichIs(vec, spl))
spl[id]
R>  [1] "A" "B" "C" "D" "E" "F" "G"

meanAlong

meanAlong(1:10, 2)
R>  [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5

scaleWithin

val <- runif(1000, 0, 100)
res1 <- scaleWithin(val, 20, 40, 60)

Data frame manipulation

Assign a category

(seqv <- stats::runif(40))
R>   [1] 0.16598681 0.12547642 0.27992596 0.72399320 0.52586732 0.90113295
R>   [7] 0.03015696 0.29393279 0.35718101 0.49142450 0.24995254 0.99526690
R>  [13] 0.48904702 0.08145321 0.09291758 0.98350707 0.98658177 0.10866281
R>  [19] 0.67279806 0.39162145 0.36948872 0.19751637 0.31270568 0.36768488
R>  [25] 0.01417330 0.49949339 0.43118302 0.15286055 0.62851665 0.67001062
R>  [31] 0.56486463 0.21142356 0.67153121 0.43431385 0.29165971 0.34591241
R>  [37] 0.75374342 0.45294877 0.19913201 0.12779252
categorize(seqv, categ=seq(0.1,0.9, 0.1))
R>   [1]  2  2  3  8  6 10  1  3  4  5  3 10  5  1  1 10 10  2  7  4  4  2  4  4  1
R>  [26]  5  5  2  7  7  6  3  7  5  3  4  8  5  2  2

Turn a matrix or a data frame into a squared matrix

mat <- matrix(1:15, 3, 5, dimnames = list(LETTERS[3:1], LETTERS[1:5]))
print(mat)
R>    A B C  D  E
R>  C 1 4 7 10 13
R>  B 2 5 8 11 14
R>  A 3 6 9 12 15
squaretize(mat, reorder = FALSE)
R>    A B C  D  E
R>  C 1 4 7 10 13
R>  B 2 5 8 11 14
R>  A 3 6 9 12 15
R>  D 0 0 0  0  0
R>  E 0 0 0  0  0

Assign classes to data frames’ columns

df1 <- matrix(signif(runif(20),4), ncol = 2)
df2 <- setColClass(df1, 2, 'character')
str(df1)
R>   num [1:10, 1:2] 0.819 0.599 0.418 0.21 0.579 ...
str(df2)
R>  'data.frame':   10 obs. of  2 variables:
R>   $ V1: num  0.819 0.599 0.418 0.21 0.579 ...
R>   $ V2: chr  "0.3529" "0.08738" "0.5434" "0.309" ...

Create data frame from scratch or from other data frame

See also this blog post on inSileco.

dfA <- data.frame(col1 = c(1, 2), col2 = LETTERS[1:2])
dfB <- data.frame(col1 = 2, col4 = "cool")

dfTemplate(2, 2)
R>    Var1 Var2
R>  1   NA   NA
R>  2   NA   NA
dfTemplate(2, 2, fill = 0)
R>    Var1 Var2
R>  1    0    0
R>  2    0    0
dfTemplate(c("value", "name"), 2, col_classes = c("numeric", "character"))
R>    value name
R>  1    NA <NA>
R>  2    NA <NA>

dfTemplateMatch(dfA, c("col4"))
R>    col1 col2 col4
R>  1    1    A   NA
R>  2    2    B   NA
dfTemplateMatch(dfA, dfB)
R>    col1 col2 col4
R>  1    1    A   NA
R>  2    2    B   NA
dfTemplateMatch(dfA, c("col1", "col4"), yonly = TRUE)
R>    col1 col4
R>  1    1   NA
R>  2    2   NA
dfTemplateMatch(dfA, c("col1", "col2"), yonly = TRUE, col_classes = "numeric", fill = 0)
R>    col1 col2
R>  1    1    A
R>  2    2    B

packagesUsed

packagesUsed(c('utils', 'methods'))
R>       name version
R>  1   utils   4.4.1
R>  2 methods   4.4.1

Export a data frame or a list of data frames

tblDown(list(CO2[1:2, ], CO2[3:6,]), "./tables.docx",
  section = "section", caption = "CO2", title = "Tables")

Messages

inSilecoMisc includes four simple message functions to standardize messages in scripts:

# 1. msgInfo() indicates what the upcoming computation
msgInfo("this is what's gonna happen next")
R>  ℹ this is what's gonna happen next
# 2. msgWarning() reminds me something important that should not affect the run
msgWarning("Got to be careful")
R>  ⚠ Got to be careful
# 3. msgError() when something went wrong (and I anticipated that it could happen)
msgError("Something wrong")
R>  ✖ Something wrong
# 4. msgSuccess() when a step/ a computation has been successfully completed
msgSuccess("All good")
R>  ✔ All good

They are meant to help structuring scripts, here is a somewhat contrived example:

scr_min <- function() {
  # msgInfo() lets me know where I am in the script
  msgInfo("Average random values")
  set.seed(111)
  out <- mean(runif(100))
  msgSuccess("Done!")
  # msgSuccess() indicates the successful completion of this part
  out
}
scr_min()
R>  ℹ Average random values
R>  ✔ Done!
R>  [1] 0.4895239

As these functions are based on message(), one can execute the script quietly by calling suppressMessages() beforehand:

# quiet run
suppressMessages(scr_min())
R>  [1] 0.4895239

Writing tables in document

tblDown() writes a data frame (and list of data frame) in a document in various formats (.docx by default)

# NB tblDown(head(CO2)) creates table.docx by default
tblDown(head(CO2), output_file = "table.odt")

tblDown() handles lists of data frames and the user can also pass a set of captions for every table and even separate them with section headers:

tblDown(list(head(CO2), tail(CO2)), output_file = "tables.pdf",
  caption = c("This is the head of CO2", "This is the tail of CO2"),
  section = "Table")

Note that if there are less captions or sections titles than data frames, vectors of captions (and/or sections) are repeated and an index is appended.

Mathematical functions

Logistic functions

seqx <- seq(-5, 5, 0.1)
par(mfrow = c(1, 2))
plot(seqx, logistic(seqx), type = "l")
abline(v = 0, h = 0, lty = 2)
plot(seqx, logistic2(seqx, yzer = .5), type = "l")
abline(v = 0, h = 0, lty = 2)

Gaussian shape

plot(gaussianShape(1:1000, 500, 2, 250, pow=5), type='l')
lines(gaussianShape(1:1000, 500, 2, 250, pow=2), lty = 2)
lines(gaussianShape(1:1000, 500, 2, 250, pow=1), lty = 3)
legend("topleft", paste("pow = ", c(5, 2, 1)), lty = 1:3)