Density Plots for Dates • datplot

Why?

A rather common problem in archaeology is the fuzziness of dates assigned to objects. If one wants to visualize overall changes in - let’s say - pottery consumption, bar charts often fall short in that regard. If we have Phases a – f, then some of the objects can usually be dated to a, c, and f, as an example, but others will by classified as “a to c” or “b to c”. But how can these data still be used for examining changes in a large set of objects, respecting the quality of their dating? The answer lies in aoristic analysis, which was implemented in criminology to assess the general temporal occurrence of burglaries from the reported time span in between which individual burglaries had happened (Ratcliffe 2000). This concept has been implemented in archaeology before (see Orton, Morris, and Pipe (2017), 3–5 for a recent overview; generally credited with the first implementation for archaeology is Johnson (2004)). There are other R-packages that implement aoristic analysis in archaeology, usually with a less object-based focus and with differing algorithms (namely aoristAAR, archSeries, tabula and rtefact). We wholeheartedly recommend trying out those packages as well to find the best fit for your data. This implementation (datplot) is kept rather simple. The case study with which the package is published (Weissova and Steinmann 2021) addresses classical archaeology, but the package itself is in no way restricted to any discipline.

How?

First, it is important to translate the phases into numbers. This should be easily possible for archaeological data, as there is usually an accepted date for each phase or period. While not directly providing numeric or absolute chronological values, e.g. the iDAI chronontology provides approaches for unified terminology. Usually, this would depend on the discipline and geographical frame of research in particular and has to be chosen for each analysis. When a concordance for phases and their absolute dating exists, it should be easy to apply these to single objects. We have illustrated the process of cleaning and transforming real world archaeological data in a very manual solution in the “Data Preparation and Visualization”-vignette to this package. Using may cause other problems in the end, since such phases are often employed to avoid dates, but it is necessary as the aim is to visualize the distribution on a (pseudo-)continuous scale, for which numbers are needed. Also, this step may be reversed for the final visualization by supplementing or replacing the scale on the x-axis with the respective phases. An automated process has not been implemented yet.

Ideally, one can produce a ‘beginning’ and ‘end’ date for each object, or let’s say an earliest possible dating and a latest possible dating, e.g. corresponding to beginning and start of each phase the object is dated to.

To show and explain how this would work, we chose a random sample of Athenian pottery from the Beazley archive (Oxford n.d.), as it is a large publicly available data set. (Since the format provided by the BAPD is slightly different from that needed here we converted the data beforehand to match requirements. No values have been changed. The sample data set is included in datplot. In a recent version the Inscriptions of Bithynia data set has been added to datplot as well. Its cleaning and formatting is detailed in the “data_preparation”-vignette as an example for scholars looking for approaches to the transformation of verbose and heterogeneous data. The same vignettes also suggests some visualizations of said data set.)

library(datplot)
data(Beazley)

	Vase.Number	Technique	DAT_min	DAT_max
58189	305582	BLACK-FIGURE	-525	-475
57000	303059	BLACK-FIGURE	-500	-450
13553	14683	BLACK-FIGURE	-550	-500
87767	9008456	RED-FIGURE	-425	-375
39540	206711	RED-FIGURE	-500	-450
56372	302421	BLACK-FIGURE	-575	-525
50053	217944	RED-FIGURE	-400	-300
15293	16602	BLACK-FIGURE	-575	-525
49428	217319	RED-FIGURE	-450	-400
15732	17065	BLACK-FIGURE	-550	-500

How to Display a Range?

The table provides two dates for each object. The earliest possible dating (DAT_min) and the latest possible dating (DAT_max). In order to be able to process this to a density graph, which is the most elegant means of visualization for continuous distributions (see also Crema, Bevan, and Lake (2010), 1123 or Baxter and Cool (2016), 125–126). (At least if the goal is merely to evaluate changes over time and the probability of object groups dating to a certain point in time, and not to look at actual objects counts, which will be obscured.) The scale we use is only pseudo-continuous, as the data actually comprises of intervals (1 year, or 25 years, etc.), but we nonetheless treat it as a continuous one.

Objects that can be dated with greater confidence should have a larger impact on the overall visualization. The core function of this package (datsteps()) produces a column named either ‘weight’ or ‘probability’ which contains a value that reflects the quality of dating (‘weight’) or the probability of an object being dated to a certain year (‘probability’). For the calculation of ‘weight’, see (Weissova and Steinmann 2021). The formula for probability calculation is simply ‘1 / ((max - min) + 1)’.

In any case, the greater the time span, the lower the weight value or probability. This is a generally accepted method of aoristic analysis (Ratcliffe 2000, 671–72 with Fig. 1) and explained in greater detail in (Weissova and Steinmann 2021). In the case of a stepsize of 1, the weight value can be seen as an actual dating probability to the specific year. As an example, a coin dated between 36 and 37 CE will thus have a probability of 0.5 for the year 36, and 0.5 for the year 37. The calculation of probability does, however not work with greater stepsizes. It is possible to switch between both variants of calculation by using datsteps(x, calc = "weight") or datsteps(x, calc = "probability"). When the probability is calculated, there is an option to add a cumulative probability column (datsteps(x, calc = "probability", cumulative = TRUE)).

Secondly, every object is duplicated a number of times equal to the dating range divided by the stepsize-variable. Each duplicate has its own ‘DAT_step’ – one single value between the two extremes. The above mentioned weight variable is divided by the number of steps, so that each new fictional object or ‘date step’ counts only as a fraction of the actual object. With this, we hope to achieve a greater influence of closely dated objects and a higher emphasis on overlap.

This method will not be useful for dating specific contexts, since any concept of terminus post/ante quem is lost here, which is important on a smaller scale. It may however be applicable for contexts that are disturbed, as a noticeable overlap of objects could give an indication for the original tpq-date. The method is suitable also for the visualization of changes in trends over time, e.g. the popularity of pottery types, or overall peaks in occupation from survey data (Orton, Morris, and Pipe 2017, 5).

Other approaches, e.g. using the median date of each object, may in some cases produce similar outcomes, but create other problems. A lot of information is lost on the way when employing averaged or median data, as large amount of loosely dated objects will produce peaks at unreasonable values. (Consider a large amount of objects dated between 600 and 400 BCE all attributed to the year 500 BCE.)

Beazley$DAT_mean <- (Beazley$DAT_max + Beazley$DAT_min) / 2
library(ggplot2)
ggplot(Beazley, aes(x = DAT_mean, fill = Technique)) +
  geom_histogram(binwidth = 25, position = "dodge") + Plot_Theme + Plot_Fill

Employing dating steps will even out unreasonable peaks. Note especially the gap between -425 and -300 in the plot above, that is – in the plot below – filled with a constant amount of objects in each year. This is due to the data containing large amounts of objects dating from -400 to -300 BCE. Of course, due to duplicating each object numerous times (see table below), the counts represented on the y-axis now inform us of the maximum amount of objects that might be dated to a given bin rather than the actual amount.

The method datplot uses for partitioning steps an be explained as follows: If any given object had been dated as ‘475 to 425 BC’ and steps of 25 years were used for each object, this would result in the creation of three sub-objects with the respective dating of: 475 BC, 450 BC, 425 BC. For an interval of 5 years, the same object would be partitioned into 11 sub-objects, each dating to 475 BC, 470 BC, 465 BC, 460 BC, 455 BC, 450 BC, 445 BC, 440 BC, 435 BC, 430 BC and 425 BC respectively. In order to keep the data comparable, the interval (expressed as the stepsize-variable in the R-package) must be chosen globally for all objects. As the time span an object is dated to can in many cases not be divided without remainder by the value given as a stepsize (i.e. may only be divided with the modulus of the division being greater than 0), datplot resolves these conflicts by temporarily modifying the size of steps for the objects in question without deviating significantly from the size set beforehand, so that the output is representative and comparable. Objects which are dated to a time span undercutting the stepsize-variable can either be assigned to one year (for objects dated precisely to one year), to two years (for objects dated to a number of years less than or equal to 40% of the stepsize) or to three years (for objects dated to a number of years exceeding 60% of the stepsize).

system.time(result <- datsteps(Beazley, stepsize = 25))[3]
#> Using 'weight'-calculation (see https://doi.org/10.1017/aap.2021.8).
#> elapsed 
#>   0.047
system.time(result <- datsteps(Beazley, stepsize = 1))[3]
#> Using 'weight'-calculation (see https://doi.org/10.1017/aap.2021.8).
#> elapsed 
#>   0.144

Using larger step sizes is only recommended if the stepsize of 1 year leads to significant delays in processing while yielding no additional information. If one where to look at a total time span of 10000 years, step sizes of 1 no longer seem very reasonable or rather necessary, as such detailed changes should not be visible or graspable at all, and using steps of 100 years will lead to the same outcome. As classical archaeologists a duration of more than 1000 years mostly exceeds our area of study. We seem to get good results with stepsizes between 1 and 25.

library(datplot)
result <- datsteps(Beazley, stepsize = 25)
#> Using 'weight'-calculation (see https://doi.org/10.1017/aap.2021.8).
ggplot(result, aes(x = DAT_step, fill = variable)) +
  geom_histogram(binwidth = 25, position = "dodge") + Plot_Theme + Plot_Fill

datsteps() can also calculate a stepsize on its own. It equals the closest possible dating of any object. As the stepsize used is saved in an attribute, this can be recycled to dynamically chose the binwidth for histograms:

result <- datsteps(Beazley, stepsize = "auto")
#> Using 'weight'-calculation (see https://doi.org/10.1017/aap.2021.8).
#> Using stepsize = 50 (auto).
ggplot(result, aes(x = DAT_step, fill = variable)) + Plot_Theme + Plot_Fill +
  geom_histogram(binwidth = attributes(result)$stepsize, position = "dodge")

ID	variable	DAT_min	DAT_max	weight	DAT_step
10957	BLACK-FIGURE	-550	-500	0.02	-550
10957	BLACK-FIGURE	-550	-500	0.02	-500
352414	RED-FIGURE	-525	-475	0.02	-525
352414	RED-FIGURE	-525	-475	0.02	-475
220668	RED-FIGURE	-450	-400	0.02	-450
220668	RED-FIGURE	-450	-400	0.02	-400

As the sample we use here only encompasses objects dated with a resolution of 50 years, we it would be best to also proceed with this stepsize. A smaller stepsize would not paint a more detailed picture, it just uses up computing time without any benefit.

Due to the impossibility of displaying object counts as well, it is ideal to use kernel density estimates for visualization. The density plot below shows the result. The peak at around -500 indicates that is area has the highest overlay, so a large part of the objects in our sample have been dated around this time. The same distribution can also be seen in the bar plots above. This, however, is not yet very informative.

result <- datsteps(Beazley, stepsize = 25)
#> Using 'weight'-calculation (see https://doi.org/10.1017/aap.2021.8).
dens <- result
dens <- scaleweight(result, var = "all")
dens <- density(x = dens$DAT_step, weights = dens$weight)
#> Warning in density.default(x = dens$DAT_step, weights = dens$weight): Selecting
#> bandwidth *not* using 'weights'
plot(dens)

Scaling the Weight along Groups of Objects

In order to display the objects separated into groups, the weights first have to be scaled along group membership, so that the sum of all weights in a group will equal 1. datplots function scaleweight() does exactly that for a dataframe as it was returned by datsteps(). A column that contains the variables for group membership needs to indicated.

result <- scaleweight(result, var = 2)

ID	variable	DAT_min	DAT_max	weight	DAT_step
10957	BLACK-FIGURE	-550	-500	0.0006993	-550
10957	BLACK-FIGURE	-550	-500	0.0006993	-525
10957	BLACK-FIGURE	-550	-500	0.0006993	-500
352414	RED-FIGURE	-525	-475	0.0006506	-525
352414	RED-FIGURE	-525	-475	0.0006506	-500
352414	RED-FIGURE	-525	-475	0.0006506	-475

Plots for the Distribution of Objects across Time

In the case of the Beazley archives data (Oxford n.d.) we can clearly see what we knew before: Black-figure pottery is older than red-figure pottery. (The data are from a random sample of athenian pottery from the Beazley archive, n = 1000.)

ggplot(data = result, aes(x = DAT_step,
                          fill = variable,
                          weight = weight)) +
  geom_density(alpha = 0.5) +
  xlab("Dating") + Plot_Theme + Plot_Fill

In case of this data, the changes between showing and omitting the weight are negligent. This, however, is not the case with more heterogeneously dated sets of objects (see the “Data Preparation and Visualization”-vignette). Please note that – even with the Beazley-data – the plot, however little, does change when the weights are omitted:

ggplot(data = result, aes(x = DAT_step,
                          fill = variable)) +
  geom_density(alpha = 0.5) +
  xlab("Dating") + Plot_Theme + Plot_Fill

When every step is valued equally, a lot of steps fall into the end of the 4th century (as mentioned above), since they were dated as e.g. “-400 to -300”. The impact here is not very huge, as the dating ranges of the objects do not vary greatly. However, the differences can be very dramatic for more heterogeneous data.

To quickly illustrate that we can plot two versions of the density from the Inscriptions of Bithynia data included in the package:

data("Inscr_Bithynia")
Inscr_Bithynia <- na.omit(Inscr_Bithynia[, c(1, 3, 8, 9)])
result_bith <- scaleweight(datsteps(Inscr_Bithynia, stepsize = "auto"),
                           var = "all")
#> Using 'weight'-calculation (see https://doi.org/10.1017/aap.2021.8).
#> Using stepsize = 1 (auto).
#> DAT_min and DAT_max at Index: 57, 68, 120, 173, 187, 238, 299, 300, 311, 312, 588, 590, 599, 679, 794, 798, 799, 828, 831, 833, 834, 837, 841, 878, 879, 908, 909, 914, 915, 931, 932, 933, 937, 938, 941, 942, 997, 1051, 1064, 1067, 1130, 1307, 1308, 1310, 1322, 1323, 1324 have the same value! Is this correct? If unsure, check your data for possible errors.

ggplot(result_bith, aes(x = DAT_step)) + Plot_Theme + Plot_Fill +
  geom_density(alpha = 0.5, fill = "grey30") + xlab("Dating")


ggplot(result_bith, aes(x = DAT_step, weight = weight)) +
  Plot_Theme + Plot_Fill +
  geom_density(alpha = 0.5, fill = "grey30") + xlab("Dating")

Including Histograms for an Impression of Quantity

We also added a function that calculated the value needed to scale the density curve to the axis of a Histogram of the dating steps. Please note that the histogram will not show the actual objects counts, but the counts of the maximum possible objects dated to the corresponding year resp. bin. The value to scale the density curve for a combined plot with a histogram can be obtained via get.histogramscale:

histogramscale <- get.histogramscale(result)

ggplot(result, aes(x = DAT_step, fill = variable)) + Plot_Theme + Plot_Fill +
  stat_density(alpha = 0.5, position = "dodge",
               aes(y = (after_stat(density) * histogramscale), weight = weight)) +
  geom_histogram(alpha = 0.5, binwidth = attributes(result)$stepsize,
                 position = "dodge") +
  labs(y = "maximum number of objects per year", x = "Dating")
#> Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel =
#> kernel, : Selecting bandwidth *not* using 'weights'

#> Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel =
#> kernel, : Selecting bandwidth *not* using 'weights'
#> Warning: Width not defined
#> ℹ Set with `position_dodge(width = ...)`

The combination of density curve and histogram also shows the common problem of histograms. Their output depends significantly on where the first bin is placed and may show a skewed distribution especially for roughly dated objects. Additionally, histograms also depend greatly on the binwidth and may not display any meaningful pattern when small bins are chosen (Shennan 1988, 25–26; Baxter and Beardah 1996).

The smooth curves of kernel density estimates are a more realistic approach to dating. The production of objects was as continuous as their use, so it seems only reasonable to display it in a more continuous fashion on a flexible timescale.

Weights and Cumulative Weights

With the update to v1.1.0 there are new option for calculation. When using a stepsize of 1 and calc = "probability", the weights now reflect the true probability for each objects dating into the specific year. With this, the probability may be used as an indicative value itself and not only as a helper for visualization. Many thanks to Christian Gugl for this suggestion.

This now opens the possibility to calculate the cumulative probability for each successive year – also suggested by Christian Gugl –, which is useful in the case of numismatic research, where it reflects the rising probability of minting. The cumulative probability has been introduced as an option in v1.1.0 and is calculated in a separate column when setting cumulative = TRUE in the datsteps()-function. Please note that the cumulative probability does not work (and does not mean anything) for stepsizes larger than 1!

data("Inscr_Bithynia")
Inscr_Bithynia <- na.omit(Inscr_Bithynia[, c(1, 3, 8, 9)])
Inscr_Bithynia <- Inscr_Bithynia[sample(seq_len(nrow(Inscr_Bithynia)), 5), ]
Inscr_Bithynia_steps <- datsteps(Inscr_Bithynia, 
                                 stepsize = 1, 
                                 calc = "probability", 
                                 cumulative = TRUE)
#> Using step-wise probability calculation.

ggplot(Inscr_Bithynia_steps, aes(x = DAT_step, y = cumul_prob, fill = variable)) + 
  geom_col() + facet_wrap(. ~ ID, ncol = 1) +
  labs(y = "Cumulative Probability", x = "Dating", fill = "Origin") + 
  theme(legend.position = "bottom")

References

Baxter, Mike J., and Christian C. Beardah. 1996. “Beyond the Histogram – Improved Approaches to Simple Data Display in Archaeology Using Kernel Density Estimates.” Archeologia e Calcolatori 7 (1): 397–408.

Baxter, Mike J., and H. E. M. Cool. 2016. “Reinventing the Wheel? Modelling Temporal Uncertainty with Applications to Brooch Distributions in Roman Britain.” Journal of Archaeological Science 66 (February): 120–27. https://doi.org/10.1016/j.jas.2015.12.007.

Crema, Enrico R., Andrew Bevan, and Mark W. Lake. 2010. “A Probabilistic Framework for Assessing Spatio-Temporal Point Patterns in the Archaeological Record.” Journal of Archaeological Science 37 (5): 1118–30. https://doi.org/10.1016/j.jas.2009.12.012.

Johnson, Ian. 2004. “Aoristic Analysis: Seeds of a New Approach to Mapping Archaeological Distributions Through Time.” In 2003 - Enter the Past. The e-Way into the Four Dimensions of Cultural Heritage. CAA 2003. Computer Applications and Quantitative Methods in Archaeology, edited by W. Börner and M. Goriany. Oxford: Archaeopress. https://publikationen.uni-tuebingen.de/xmlui/handle/10900/60663.

Orton, David, James Morris, and Alan Pipe. 2017. “Catch Per Unit Research Effort: Sampling Intensity, Chronological Uncertainty, and the Onset of Marine Fish Consumption in Historic London.” Open Quaternary 3 (January): 1. https://doi.org/10.5334/oq.29.

Oxford, University of. n.d. “Beazley Archive Pottery Database (BAPD).” Accessed April 27, 2018. https://www.carc.ox.ac.uk/carc/pottery.

Ratcliffe, Jerry H. 2000. “Aoristic Analysis: The Spatial Interpretation of Unspecific Temporal Events.” International Journal of Geographical Information Science 14 (7): 669–79. https://doi.org/10.1080/136588100424963.

Shennan, Stephen. 1988. Quantifying Archaeology. Edinburgh: Edinburgh University Press.

Weissova, Barbora, and Lisa Steinmann. 2021. “Datplot: A New r-Package for the Visualization of Date Ranges in Archaeology.” Advances in Archaeological Practice 9 (7): 288–98. https://doi.org/10.1017/aap.2021.8.