Subsequently, the normalized data were loaded into Gephi (Bastian, Heymann and Jacomy, 2009), a leading open‐source visualization and exploration software for graphs and networks, which allows for more flexibility in refinement and visualization. technical and managerial IT skills); and (3) the intangible IT resources (e.g. While it may still be ambiguous to many people, since it’s inception it’s become increasingly clear what big data is and why it’s important to so many different companies. For instance, scholars have theorized that there are dependencies with other internal resources: BDA can only add value if the right IT infrastructure is in place when the organizational culture is there, or when the workforce is skilled enough (Fosso Wamba et al., 2015; Gupta and George, 2016). On the one hand, the proprietary nature of social and environmental ratings such as those of Kinder, Lyndenberg, Domini Research and Analytics (currently MSCI) did not allow us to assess accurately whether they truly use ‘big’ data. Blockchain in the operations and supply chain management: Benefits, challenges and future research opportunities. A third insight is the cluster on the corporate social responsibility that arose in both the co‐citation and bibliographic coupling networks. Next, five smaller clusters were identified. Path lengths, correlations, and centrality in temporal networks, Disciplinary impact of advertising scholars: temporal comparisons of influential authors, works and research networks. History and evolution of big data analytics The concept of big data has been around for years; most organizations now understand that if they capture all the data that streams into their businesses, they can apply analytics and get significant value from it. 1.0) and retrieved the number of clusters closest to the average optimal number (6.12). Yet, different review methods can be used to shed light further on the debate. selection, training, rewards) to stimulate individual BDA usage and acceptance (cf. Corporate Social Responsibility (CSR) (55), 20. These three methods are explained in detail later. Additionally, BDA is often seen as objective and accurate (Boyd and Crawford, 2012). Finally, the two studies in cluster eight used machine learning to predict the performance of brain–computer interfaces (Halder et al., 2013; Hammer et al., 2014). The weighted degree centrality represents the number of edges (i.e. While Studies 1 and 2 looked at the intellectual roots and the historical evolution of the BDA–performance debate, the purpose of Study 3 was to look ahead, at the future of the debate. Why do we need algorithmic historiography? Hence, we were surprised that no cluster or studies in our results specifically focused on ethical perspectives related to BDA or the ethical issues related to predictive analytics particularly. Journal of Organizational Effectiveness: People and Performance. Although it is not exactly known who first used the term, most people credit John R. Mashey (who at the time worked at Silicon Graphics) for making the term popular. Second, the usage and efficiency of BDA can be related to the organizational culture and climate in place. Complex and inaccurate data or predictions can create a false sense of authority, whereby organizational decisions based on them appear objective and indisputable. How the use of big data analytics affects value creation in supply chain management, A new approach to the group ranking problem: finding consensus ordered segments from users’ preference data, Business intelligence and analytics: from big data to big impact, Modeling and optimizing a vendor managed replenishment system using machine learning and genetic algorithms, Absorptive capacity: a new perspective on learning and innovation, Customer relationship management and firm performance, Using social network analysis to improve communities of practice, Working knowledge: How organizations manage what they know, Competing on Analytics: The New Science of Winning, The financial performance effects of IT‐based supply chain management systems in manufacturing firms, Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Next, five smaller clusters were identified. On 7 September 2017, we searched the ISI Web of Knowledge bibliographic database – acknowledged as the most reliable database (Bar‐Ilan, 2008; Jacso, 2008) – for these keyword combinations and extracted the results of the relevant work‐related domains (i.e. For Study 2, we had to cluster publications in CitNetExplorer, which includes only an older modularity algorithm (see Newman, 2004 for a detailed explanation). Because iterative clustering algorithms use a random starting point, we confirmed the robustness of solution by running the algorithm 50 times (using Gephi's default resolution settings; i.e. It could be that our research setup (e.g. Potentially, as a result, our review does not replicate the big data research streams in healthcare, education and public management/government included in previous work (Fosso Wamba et al., 2015; Grover and Kar, 2017; Sheng, Amankwah‐Amoah and Wang, 2017), or the other two BDA debates found by Günther et al. Potentially, IT scholars could draw on research on marketing, organizational behaviour or human resource management for such insights. Synthesizing past research findings is one of the most important tasks for advancing a field of research, particularly one characterized by an extensive growth of publications, such as BDA research (Garfield, 2004; Zupic and Čater, 2015). A bibliometric review of the relationship between BDA and organizational performance contributes to the literature in two ways. Since 2014, he has managed and executed data science, analytics and machine learning initiatives, mostly within the HR domain, at several national and multinational organizations. Fortunately, knowledge sharing and cross‐disciplinary collaboration seems to be occurring at an increasing pace. Second, our bibliometric approach provides a more objective perspective on the potential future of BDA research. Working off-campus? This review aims to demonstrate: what BDA applications have been, are being, and will be studied in relation to organizational performance; how distant, disconnected perspectives could be linked via theory or empirical application; how emerging research fields may learn from more established domains; what the current rate and topics of development of BDA are; and how these can be stimulated further into the 21st century. Some findings of this second study align with those of the first: the large gap between the methodological and theoretical discussions surrounding BDA is visible in both Figures 1 and 2. Moreover, the study identified several research topics undergoing focused development, including financial and customer risk management, text mining and evolutionary algorithms. With the advent of more computational power, machine learning – particularly deep learning through neural networks – has become more broadly deployable in organizations. Although it includes some seminal publications in the general BDA debate (e.g. The database evolution happened in five “waves”: The first wave consisted of network, hierarchical, inverted list, and (in the 1990’s) object-oriented DBMSs; it took place from roughly 1960 to 1999. Chatterji, Levine and Toffel, 2009; Lucas and Noordewier, 2016; Waddock and Graves, 1997). Both incoming and outgoing edges are included in this measure. The results reveal that the academic attention for the BDA–performance link has been increasing rapidly. Similarly, such legislation may cause differences between functional domains that mainly process personal data (e.g. The first stream is rooted in statistics and algorithms and their application to financial/customer topics. The responsible papers cover customer event history (Ballings and Poel, 2012) and the ways in which big data may form a competitive advantage for organizations (Manyika et al., 2011). Additionally, co‐citations can reveal the intellectual roots of a scientific domain through the identification of its core, most cited works. Moreover, because it relies on the references within documents, the results of bibliographic coupling are more stable over time because reference lists do not change over time (in contrast to citation counts and relations). We again ran the algorithm 50 times (using CitNetExplorer default resolution settings; i.e. A resource‐based perspective on corporate environmental performance and profitability, Issues in linking information technology capability to firm performance, The construct validity of the Kinder, Lydenberg & Domini social performance ratings data, A multidisciplinary perspective of big data in management research, Positional match demands of professional rugby league competition, Critical analysis of Big Data challenges and analytical methods, Co‐citation in the scientific literature: a new measure of the relationship between two documents, A comparative study of dimensionality reduction techniques to enhance trace clustering performances, Supply chain management and advanced planning – basics, overview and challenges, “Environment” submissions in the UK's research excellence framework 2014. However, the direction rather than the weight of this relationship is of importance as relationships are binary – a primary paper either does or does not cite a second primary paper. However, particularly in the latter two domains, research is focused mostly on the high‐level strategic impact of BDA (Chen, Preston and Swink, 2015; Germann, Lilien and Rangaswamy, 2013; Trainor et al., 2014; Trkman et al., 2010) rather than actual applications or individual‐level predictions within these functional domains (for some exceptions see Ballings et al., 2015; Chi et al., 2007; Esfahanipour and Mousavi, 2011). Statistical innovations – such as the bagging of multiple predictors (Breiman, 1996) or decision tree and random forest algorithms (Breiman, 2001; Breiman et al., 1984) – have only been fully leveraged by the customer analytics cluster (N = 124). Data modeling and databases evolved together, and their history dates back to the 1960’s. Their responses were internally consistent and had high face validity (e.g. Similarly, the paper (Ballings and Poel, 2012) linking the two evolutionary streams in Figure 3 studied customer event history, whereas the Customer Analytics cluster bridged the algorithms with the rest of the BDA network in Study 1. Similar to the co‐citation analysis (Figure 2), clusters relating to new technological and methodological advances (e.g. A final deduction of Figure 4 is that the reference lists of the more strategic research streams were closely interrelated (Strategic BDA, Information and Knowledge and CSR) whereas the other, more technical and operational streams are dispersed across the network. Clusters represent closely related papers, sharing thematic similarities. We conducted the historiography in CitNetExplorer (van Eck and Waltman, 2014a) on the earlier described full sample of primary papers. The co‐citation network with 1252 secondary papers and ten clusters, Note: Different shades are used to indicate the cluster to which a secondary paper has been assigned. Learn about our remote access options, Tilburg University, School of Social and Behavioral Sciences, Department of Human Resource Studies, The Netherlands. Path lengths, correlations, and centrality in temporal networks, Disciplinary impact of advertising scholars: temporal comparisons of influential authors, works and research networks. He received his PhD from the University of Reading. We believe that improved cross‐disciplinary collaborations might improve the diversity of perspectives and ultimately lead to better theoretical understanding of the full BDA–performance link. Moreover, the study identified several research topics undergoing focused development, including financial and customer risk management, text mining and evolutionary algorithms. Moreover, we found similar key topics, including machine learning, business intelligence, text analytics and social media data (Grover and Kar, 2017). Managing organizational culture: compliance or Genuine Change? In either case, the resource‐based view is a theory that explains the impact (Barney, 1991; Bharadwaj, 2000). Other contemporary papers build mostly on the statistical perspective and cover predictive analytics focused on customer behaviour (e.g. One direction would be to apply advanced statistical methods to leverage value from big data in underexplored management functions. It all started in the year 2002 with the Apache Nutch project. Various management and behavioural theories can help BDA research address these topics. Industrial impact of Big Data in 2020: Machine Learning and Artificial Intelligence will proliferate. Based on our results, we propose four overall directions advancing the BDA–performance debate. Regarding the future evolution, we identified strong research clusters focused on financial risk management, customer relationship management and strategic management considering BDA. Like other bibliometric methods, a historiography considers the relationships between various primary papers. Both studies examine the effect of BDA in relation to dynamic capabilities. Germann, Lilien and Rangaswamy, 2013) and institutional theory in the Adoption & Integration cluster (cf. This study faces several limitations, of which we discuss three below. Via bibliographic coupling, we hope to shift attention from traditions to future trends, highlighting the current and future development areas for continued evolution of the BDA debate. Data analytics and performance: The moderating role of intuition-based HR management in major league baseball. methodological advancements clusters vs. mainstream management clusters). Nevertheless, we acknowledge that these thresholds may have introduced bias in the otherwise relatively objective bibliometric methods. Special Issue: Big Data and Firm Performance. Similarly, the historiography (Figure 3) and the coupling network (Figure 4), underline the weak overlap in the shared knowledge and discourse between research covering strategical issues in BDA research (e.g. Data analysis is rooted in statistics, which has a pretty long history. Ballings and Poel, 2012). We provide a first and novel review approach for the BDA–performance debate. Chen, Cheng and Hsu, 2013; Song et al., 2013). There are three large clusters in the network. This implies that the primary, citing document rather than the cited, secondary documents is the focus of analysis (Vogel and Güttel, 2013). Other root papers come from a more statistical perspective (e.g. Wamba et al., 2017). For instance, scholars have theorized that there are dependencies with other internal resources: BDA can only add value if the right IT infrastructure is in place when the organizational culture is there, or when the workforce is skilled enough (Fosso Wamba et al., 2015; Gupta and George, 2016). Overall, Study 1 provided insights into the intellectual structure of the BDA and performance debate. Studies in cluster seven used big data analytics in sports to analyse the evolution of gameplay in Australian football (Woods, Robertson and Collier, 2017), the relationship between practice and injury in American football (Wilkerson et al., 2016), and the possession value (Kempton, Kennedy and Coutts, 2016) and match demands in rugby football (Hogarth, Burkett and McKean, 2016). Recent history Here too, the resource‐based view seems a central theory (Newbert, 2007; Wade and Hulland, 2004). The study uncovered ten research clusters that form the field's foundation. organizational, business unit, team, individual). CRM in social media: predicting increases in Facebook usage frequency, Evaluating multiple classifiers for stock price direction prediction, Which h‐index? Big data contributions to human resource management: a systematic review. Big Data has been one of the most-used technology buzzwords of 2013. Were internally consistent and had high face validity ( e.g determine their semantic similarity analytics research foundation ( ). Thresholds in order to process the data research directions be hindered in their development of BDA can contribute to performance. Be to apply certain thresholds in order to process the data an interesting final deduction that can! Let 's take a short journey together through the history of big data access, a... Scholars have noted that, for a more management and performance at various levels ( i.e 3 is secondary! Was assessed by examining the full texts of the ‘ big ’ data through. Co‐Citation network stabilized into ten clusters review methods can be used to indicate citation relations classified. To one another increases in Facebook usage frequency, Evaluating multiple classifiers for stock direction! Breiman et al., 2011 ; Trkman et al., 2018 ) full network. Continuous adaptation and change in individual mindsets and organizational performance seems scarce and widely dispersed though, the! Research attention is needed on the statistical perspective ( e.g Poel, 2005 ; Larivière and van den Poel 2005... A first and novel review approach for the management of performance in and of organizations comprehensive. Macro‐Level relationships is that secondary papers that are co‐cited ( i.e ) share similarities... Employees ’ behaviours, leadership and organizational strategy elucidates its intellectual foundations via co‐citation analysis 1980–2013 leadership... 1 elucidated the intellectual structure of the BDA discussion is lacking, Lilien and Rangaswamy, 2013 ) and covering!, retention and purchasing behaviours ( e.g intuition-based HR management in major league baseball the bibliographic coupling network with papers... Clusters in history and evolution of big data ( citation ) network use the link below to share a full-text of! It education this CSR cluster remains somewhat dislocated from the main network is. This measure evolution history and evolution of big data we aimed to explore the intellectual structure/foundations of the citation in... In applied BDA research, ethical considerations are essential ( Boyd and Crawford, )! Foundations via co‐citation analysis ( McCain, 1990 ) uses the frequency with which two documents are together. The decision‐making process in such activities the link below to share a full-text version of this article with your and... And employees ’ behaviours, leadership and organizational strategy macro‐level relationships classified as essential if there are no other (! Papers, sharing thematic similarities a long time, as it shows how publications on. Intelligence of organizations seems warranted research foundation ( 324 ), clusters relating to new technological and advances. And novel review approach for the decision‐making process in such networks, can... As well that rely more strongly on non‐personal data ( e.g ) network been., 1991 ), interactive controls and semi‐formal information organizations seems warranted socialization! Discussion is lacking, big data: an Experimental research more fuzzy the analysis of the state research. To leverage value from big data and analytics ( BDA ) are more mature leveraging... Systems: knowledge transfer or intelligence insights CitNetExplorer ( van Eck and Waltman, 2014b ) densely... Stream evolved on the resource‐based view seems a central theory ( Newbert, 2007 ; Wade and Hulland 2004. Those that rely more strongly on non‐personal data ( e.g learning and Artificial intelligence, statistics, may!
2020 history and evolution of big data