This Vision Interview with Petr Knoth, Senior Research Fellow in Text and Data Mining at the Open University and Head of CORE (core.ac.uk), served as the opening segment of the NISO Hot Topic virtual conference, Text and Data Mining, held on May 25, 2022. Todd Carpenter spoke at length with Knoth about the many ways in which text and data mining impacts the present as well as the future. They discussed just how innovative this technology can be for the needs of researchers in the information community. Continue reading on the Jisc Research blog. Related Links: Read more.
The mission of the Big Scientific Data and Text Analytics Group (BSDTAg) is to advance the state-of-the-art and develop new technologies powered by AI in the area of the machine processing of scientific information.
We identify with the use of AI for the public good. We carry out research to empower the next generation of researchers to be able to more effectively access, understand, interpret and build on open knowledge and to do so in line with the principles of open science.
More specifically, we:
- apply AI to improve ways in which research is conducted;
- develop novel technologies enabling systematic analysis of research data and literature;
- create services to improve access to scientific information for all;
- do research on research;
- support the transition to and raise awareness of the benefits of open research;
- work with companies to help them derive value from research data and scientific information in areas as diverse as analysing trends, detecting misinformation and plagiarism detection.
We firmly believe that scientific knowledge should be available to all, not just a privileged few. Open Access and Open Science are key drivers for equal access to information for everyone. Through our CORE service, we deliver credible scientific information to tens of millions of people from more than 260 countries each month (SDG 10: Reduced Inequalities). Additionally, Open Science is a key component in helping to ensure everyone has equal access to robust scientific knowledge at every level, from high school students to post-doctoral researchers (SDG 4: Quality Education)
There is a current crisis on a global scale with mis-information having an impact in many areas from politics to climate change and beyond. A well informed society with access to reliable, trustworthy information is a cornerstone of a strong democracy (SDG 16: Peace, Justice and Strong Institutions). Our group provides free access to the largest collection of Open Access, peer-reviewed, scientific literature thus helping to ensure that accurate, reliable information is available to all. We are also developing AI powered solutions for the public good, helping to empower people to form their opinions and take decisions based on sound scientific evidence, protecting against conspiracy and mis-information in a wide range of areas, including medical, clinical and bio-medical research (SDG 3: Good Health & Wellbeing).
Our research lies at the intersection of the following areas:
- Data science, natural language processing, machine learning, data mining, big data
- Information retrieval, information extraction, recommender systems, semantic web
- Open science, scientometrics, scholarly communication
- Industry: we advise, provide analytics and deliver new technologies for organisations and innovative industries in areas as diverse as checking and detection of misinformation, analysing research trends, plagiarism detection, research impact evaluation, expert search and recruiting, academic search engines and literature-based discovery.
- Academic institutions: we deliver innovative tools and support academic institutions with an analysis of their research outputs, open access compliance, trends, comparisons to their rival institutions in the context of research assessment exercises.
- Funders: we collect data from thousands of institutions and facilitate monitoring of open access compliance and reporting.
- Partner projects: we derive our reputation from strong collaboration with some of the most prestigious organisations in the area of scholarly communication.
- Boch, Michael; Gindl, Stefan; Barnett, Alan; Margetis, George; Mireles, Victor; Adamakis, Emmanouil; Knoth, Petr (2022). A Systematic Review of Data Management Platforms . In: WorldCIST'22, 12-14 Apr 2022, Budva, Montenegro
- Kunnath, Suchetha N.; Herrmannova, Drahomira; Pride, David; Knoth, Petr (2022). A Meta-analysis of Semantic Classification of Citations . Quantitative Science Studies, 2 (4), pp. 1170-1215
- Kusa, Wojciech; Hanbury, Allan; Knoth, Petr (2022). Automation of Citation Screening for Systematic Literature Reviews using Neural Networks: A Replicability Study . In: 44th European Conference on Information Retrieval, 10-14 Apr 2022, Stavanger, Norway Springer , 13185 , pp. 584-598
- Nambanoor Kunnath, Suchetha; Stauber, Valentin; Wu, Ronin; Pride, David; Botev, Viktor; Knoth, Petr (2022). ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations . In: Proceedings of the 13th Language Resources and Evaluation Conference, 20-25 Jun 2022, Marseille Association for Computational Linguistics
- Kunnath, Suchetha N.; Pride, David; Herrmannova, Drahomira; Knoth, Petr (2021). Overview of the 2021 SDP 3C Citation Context Classification Shared Task . In: Second Workshop on Scholarly Document Processing (SDP), 10 Jun 2021, Mexico City, Mexico Association for Computational Linguistics , pp. 137-145
- Taha, Abdel Aziz; Papariello, Luca; Alexandros, Bampoulidis; Knoth, Petr; Lupu, Mihai (2021). Formal Analysis and Estimation of Chance in Datasets Based on Their Properties . IEEE Transactions on Knowledge and Data Engineering, pp. (Early Access)
- Gyawali, Bikash; Anastasiou, Lucas; Knoth, Petr (2020). Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings . In: 12th Language Resources and Evaluation Conference, 11-16 May 2020, Marseille, France European Language Resources Association , pp. 894-903
- Gyawali, Bikash; Pontika, Nancy; Knoth, Petr (2020). Open Access 2007 - 2017: Country and University Level Perspective . In: Joint Conference on Digital Libraries, 1-5 Aug 2020, Virtual Event, China
- Kunnath, Suchetha N.; Pride, David; Gyawali, Bikash; Knoth, Petr (2020). Overview of the 2020 WOSP 3C Citation Context Classification Task . In: Proceedings of the 8th International Workshop on Mining Scientific Publications, 05 Aug 2020, Wuhan, China Association for Computational Linguistics Proceedings of the 8th International Workshop on Mining Scientific Publications, pp. 75-83
- Pride, David; Knoth, Petr (2020). An Authoritative Approach to Citation Classification . In: ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20), 1-5 Aug 2020, Virtual - China
- Herrmannova, Drahomira; Pontika, Nancy; Knoth, Petr (2019). Do Authors Deposit on Time? Tracking Open Access Policy Compliance . In: 2019 ACM/IEEE Joint Conference on Digital Libraries, 2-6 Jun 2019, Urbana-Champaign, IL , pp. 206-216
- Knoth, Petr; Anastasiou, Lucas; Cancellieri, Matteo; Gyawali, Bikash; Herrmannova, Drahomira; Misak, Sergei; Huba, Alexander; Pearce, Samuel; Pontika, Nancy; Rumyanceva, Svetlana; Tarasiuk, Maria (2019). Aggregating The World's Open Access Research Papers . , Porto, Portugal Open Science Fair
- Pride, David; Harag, Jozef; Knoth, Petr (2019). ACT: An Annotation Platform for Citation Typing at Scale [JCDL Poster Presentation] . In: JCDL 2019 - ACM/IEEE-CS Joint Conference on Digital Libraries 2019, 2-6 Jun 2019, Urbana-Champaign, Illinois
- Pride, David; Harag, Jozef; Knoth, Petr (2019). ACT: An Annotation Platform for Citation Typing at Scale . In: JCDL 2019 - ACM/IEEE Joint Conference on Digital Libraries 2019, 2-6 Jun 2019, Urbana-Champaign, Illinois
- Herrmannova, Drahomira (2018). Mining Scholarly Publications for Research Evaluation . The Open University
- Herrmannova, Drahomira; Knoth, Petr; Patton, Robert (2018). Analyzing Citation-Distance Networks for Evaluating Publication Impact . In: 11th edition of the Language Resources and Evaluation Conference, 7-12 May 2018, Miyazaki, Japan
- Herrmannova, Drahomira; Knoth, Petr; Stahl, Christopher; Patton, Robert; Wells, Jack (2018). Research Collaboration Analysis Using Text and Graph Features . In: 19th International Conference on Computational Linguistics and Intelligent Text Processing, 18-24 Mar 2018, Hanoi, Vietnam
- Herrmannova, Drahomira; Knoth, Petr; Stahl, Christopher; Patton, Robert; Wells, Jack (2018). Text and Graph Based Approach for Analyzing Patterns of Research Collaboration: An analysis of the TrueImpactDataset . In: 1st Workshop on Computational Impact Detection from Text Data (CIDTD), 8 May 2018, Miyazaki, Japan
- Herrmannova, Drahomira; Patton, Robert M.; Knoth, Petr; Stahl, Christopher G. (2018). Do citations and readership identify seminal publications? . Scientometrics, 115 (1), pp. 239-262
- Khadka, Anita; Knoth, Petr (2018). Using citation-context to reduce topic drifting on pure citation-based recommendation . In: 12th ACM Conference on Recommender Systems, 02-07 Oct 2018, Vancouver, British Columbia, Canada ACM Press RecSys '18: Proceedings of the 12th ACM Conference on Recommender Systems, pp. 362-366
- Labropoulou, Penny; Galanis, Dimitrios; Lempesis, Antonis; Greenwood, Mark; Knoth, Petr; Eckart de Castilho, Richard; Sachtouris, Stavros; Georgantopoulos, Byron; Anastasiou, Lucas; Martziou, Stefania; Katerina, Gkirtzou; Manola, Natalia; Piperidis, Stelios (2018). OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content . In: 7th International Workshop on Mining Scientific Publications, 7-12 May 2018, Miyazaki, Japan European Language Resources Association (ELRA)
- Pride, David; Knoth, Petr (2018). Peer review and citation data in predicting university rankings, a large-scale analysis . In: Theory and Practice of Digital Libraries (TPDL) 2018, 10-13 Sep 2018, University of Porto, Portugal
- Cancellieri, Matteo; Pontika, Nancy; Pearce, Samuel; Anastasiou, Lucas; Knoth, Petr (2017). Building scalable digital library ingestion pipelines using microservices . In: MSTR 2017: 11th International Conference on Metadata and Semantics Research, 28 Nov - 1 Dec 2017, Tallinn, Estonia
- Herrmannova, Drahomira; Patton, Robert; Knoth, Petr; Stahl, Christopher (2017). Citations and Readership are Poor Indicators of Research Excellence: Introducing TrueImpactDataset, a New Dataset for Validating Research Evaluation Metrics . In: 1st Workshop on Scholarly Web Mining, 10 Feb 2017, Cambridge, UK ACM , pp. 41-48
- Knoth, Petr; Anastasiou, Lucas; Basile, Giorgio; Pearce, Samuel; Pontika, Nancy (2017). Machine accessibility of Open Access scientific publications from publisher systems via ResourceSync . OAI10
- Knoth, Petr; Anastasiou, Lucas; Charalampous, Aristotelis; Cancellieri, Matteo; Pearce, Samuel; Pontika, Nancy; Bayer, Vaclav (2017). Towards effective research recommender systems for repositories . In: Open Repositories 2017, 26-30 Jun 2017, Brisbane, Australia
- Knoth, Petr; Gooch, Phil; Jack, Kris (2017). What Others Say About This Work? Scalable Extraction of Citation Contexts from Research Papers . Lecture Notes in Computer Science, 10450 , pp. 287-299
- Knoth, Petr; Khadka, Anita (2017). Can we do better than co-citations? Bringing Citation Proximity Analysis from idea to practice in research articles recommendation . In: 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017), 11 Aug 2017, Tokyo, Japan CEUR Workshop Proceedings, 1888 , pp. 14-25
- Pontika, Nancy; Knoth, Petr; Anastasiou, Lucas; Charalampous, Aristotelis; Cancellieri, Matteo; Pearce, Samuel; Bayer, Vaclav (2017). The uptake of the CORE recommender in repositories . OpenRepositories2017
- Pride, David; Knoth, Petr (2017). Incidental or influential? – A decade of using text-mining for citation function classification. . In: 16th International Society of Scientometrics and Informetrics Conference, 16-20 Oct 2017, Wuhan
- Pride, David; Knoth, Petr (2017). Incidental or Influential? - Challenges in Automatically Detecting Citation Importance Using Publication Full Texts . In: 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, 18-21 Sep 2017, Thessaloniki, Greece Springer , pp. 572-578
- Herrmannova, Drahomira; Knoth, Petr (2016). An Analysis of the Microsoft Academic Graph . D-Lib Magazine, 22 (9/10)
- Herrmannova, Drahomira; Knoth, Petr (2016). Semantometrics: Towards Fulltext-based Research Evaluation . In: Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries, 19-23 Jun 2016, Newark, New Jersey, USA ACM , pp. 235-236
- Herrmannova, Drahomira; Knoth, Petr (2016). Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking . In: Proceedings of the WSDM Cup 2016 - Entity Ranking Challenge, the 9th ACM International Conference on Web Search and Data Mining, 22-25 Feb 2016, San Francisco, CA, USA
- Herrmannova, Drahomira; Knoth, Petr (2016). Towards full-text based research metrics: Exploring semantometrics . Jisc
- Knoth, Petr; Pontika, Nancy (2016). Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining: Deliberate Lack of Interoperability or Not? . In: INTEROP2016, 23 May 2016
- Pontika, Nancy; Knoth, Petr; Cancellieri, Matteo; Pearce, Samuel (2016). Developing Infrastructure to Support Closer Collaboration of Aggregators with Open Repositories . LIBER Quarterly, 25 (4), pp. 172-188
- Shearer, Kathleen; Rodrigues, Eloy; Bollini, Andrea; Cabezas, Alberto; Castelli, Donatella; Carr, Les; Chan, Leslie; Humphrey, Chuck; Johnson, Rick; Knoth, Petr; Manghi, Paolo; Matizirofa, Lazarus; Perakakis, Pandelis; Schirrwagen, Jochen; Smith, Tim; Van de Sompel, Herbert; Walk, Paul; Wilcox, David; Yamaji, Kazu (2016). Next generation repositories: Scaling up repositories to a global knowledge commons . In: Open Repositories 2018, 4-6 Jun 2018
- Herrmannova, Drahomira; Knoth, Petr (2015). Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysing Patterns of Research Collaboration . D-Lib Magazine, 21 (11/12)
- Herrmannova, Drahomira; Knoth, Petr (2015). Semantometrics: Fulltext-Based Measures for Analysing Research Collaboration . In: Proceedings of the 15th International Conference of the International Society for Scientometrics and Informetrics, 29 Jun - 3 Jul 2015, Istanbul, Turkey International Society for Scientometrics and Informetrics
- Knoth, Petr (2015). Linking Textual Resources to Support Information Discovery . The Open University
- Pontika, Nancy; Knoth, Petr (2015). Open Science Taxonomy . FOSTER
- Pontika, Nancy; Knoth, Petr; Cancellieri, Matteo; Pearce, Samuel (2015). Fostering Open Science to Research using a Taxonomy and an eLearning Portal . In: iKnow: 15th International Conference on Knowledge Technologies and Data Driven Business, 21-22 Oct 2015, Graz, Austria
- Kats, Pavel; Knoth, Petr; Mamakis, Georgios; Mielnicki, Marcin; Muhr, Markus; Werla, Marcin (2014). Design of Europeana Cloud technical infrastructure . In: Digital Libraries 2014 (2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), 8-12 Sep 2014, London, UK , pp. 491-492
- Knoth, Petr; Anastasiou, Lucas; Pearce, Samuel (2014). My repository is being aggregated: a blessing or a curse? . In: Open Repositories 2014 (OR2014), 9-13 Jun 2014, Helsinki, Finland
- Knoth, Petr; Herrmannova, Drahomira (2014). Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing a Research Publication's Contribution . D-Lib Magazine, 20 (11/12)
- Knoth, Petr (2013). From open access metadata to open access content: two principles for increased visibility of open access content . In: Open Repositories 2013, 8-12 Jul 2013, Charlottetown, Prince Edward Island, Canada
- Knoth, Petr; Herrmannova, Drahomira (2013). Simple yet effective methods for cross-lingual link discovery (CLLD) - KMI @ NTCIR-10 CrossLink-2 . In: NTCIR-10 Evaluation of Information Access Technologies, 18-21 Jun 2013, Tokyo, Japan , pp. 39-46
- Knoth, Petr; Zdrahal, Zdenek (2013). CORE: aggregation use cases for open access . In: 2nd International Workshop on Mining Scientific Publications (WOSP 2013), 26 Jul 2013, Indianapolis, IN
- Herrmannova, Drahomira; Knoth, Petr (2012). Visual search for supporting content exploration in large document collections . D-Lib Magazine, 18 (7/8)
- Knoth, Petr; Zdrahal, Zdenek (2012). CORE: three access levels to underpin open access . D-Lib Magazine, 18 (11/12)
- Knoth, Petr; Zdrahal, Zdenek; Juffinger, Andreas (2012). Guest editorial . D-Lib Magazine, 18 (7/8)
- Knoth, Petr; Robotka, Vojtech; Zdrahal, Zdenek (2011). Connecting repositories in the open access domain using text mining and semantic data . In: 15th International Conference on Theory and Practice of Digital Libraries: research and advanced technology for digital libraries (TPDL 2011), 26-28 Sep 2011, Berlin, Germany , 6966 , pp. 483-487
- Knoth, Petr; Zdrahal, Zdenek (2011). Mining cross-document relationships from text . In: The First International Conference on Advances in Information Mining and Management (IMMM 2011), 23-28 Oct 2011, Barcelona, Spain
- Knoth, Petr; Zdrahal, Zdenek (2011). CORE: connecting repositories in the open access domain . In: CERN Workshop on Innovations in Scholarly Communication (OAI7), 22-24 Jun 2011, Geneva, Switzerland
- Knoth, Petr; Zilka, Lukas; Zdrahal, Zdenek (2011). KMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia using explicit semantic analysis . In: NTCIR-9: The 9th NTCIR Workshop Meeting: Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, 6-9 Dec 2011, Tokyo, Japan
- Knoth, Petr; Zilka, Lukas; Zdrahal, Zdenek (2011). Using Explicit Semantic Analysis for Cross-Lingual Link Discovery . In: 5th International Workshop on Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies (CLIA) at The 5th International Joint Conference on Natural Language Processing (IJC-NLP 2011), 08-13 Nov 2011, Chiang Mai, Thailand
- Knoth, Petr; Collins, Trevor; Sklavounou, Elsa; Zdrahal, Zdenek (2010). Facilitating cross-language retrieval and machine translation by multilingual domain ontologies . In: Workshop on Supporting eLearning with Language Resources and Semantic Data (at LREC 2010), 22 May 2010, Valletta, Malta
- Knoth, Petr; Novotny, Jakub; Zdrahal, Zdenek (2010). Automatic generation of inter-passage links based on semantic similarity . In: Computational Linguistics (COLING 2010), 23-27 Aug 2010, Beijing, China , pp. 590-598
- Knoth, Petr; Schmidt, Marek; Smrz, Pavel; Zdrahal, Zdenek (2009). Towards a framework for comparing automatic term recognition methods . In: Conference Znalosti 2009, 4-6 Feb 2009, Brno, Czech Republic
- Schmidt, Marek; Knoth, Petr; Smrz, Pavel (2009). Information extraction in the KiWi Project . In: Znalosti 2009, 4-6 Feb 2009, Bratislava, Slovakia
- Knoth, Petr (2008). Extraction of semantic relations from texts . In: Conference and Student EEICT 2008, 24 Apr 2008, Brno, Czech Republic
- Opsomer, Rob; Knoth, Petr; van Polen, Freek; Trapman, Jantine; Wiering, Marco (2008). Categorizing children: automated text classification of CHILDES files . In: The 20th Belgian-Netherlands Conference on Artificial Intelligence (BNAIC 2008), 30-31 Oct 2008, Enchede, The Netherlands
The BSDTAg has led and participated in a variety of national and EU-funded projects. Topics range from creating analytics applications and services to text mining scientific and scholarly data, digital libraries, open science and responsible research and innovation.
Petr KnothFounder & Head of CORE
Lucas AnastasiouSenior Developer
Viktoriia BorsukUX designer
Valeriy BudkoFull Stack Developer
Matteo CancellieriLead Developer (Backend)
Drahomira HerrmannovaData Analyst & Researcher
Catherine KuliavetsPersonal & Team Administrative Assistant
Suchetha Nambanoor-KunnahPhD Student
Nancy PontikaOpen Access Specialist & Communications
David PrideResearch Associate
Maria TarasiukFront-End developer
Kostiantyn VoskoboinikSoftware Developer
Anton ZhukBack-end developer
Andrew VasilievSystem Administrator
The Principles of Open Scholarly Infrastructure (POSI) offer a set of guidelines by which open scholarly infrastructure organisations and initiatives that support the research community can be operated and sustained. In this post, we demonstrate CORE's commitment to adhere to these principles and show our current progress in achieving these aims. The principles are divided into three main categories; Governance, Sustainability and Insurance. More about it is in our blog post. Related Links: Read more.
The first quarter of the new year was very productive for the CORE team with a number of new releases. First, we have been working hard on improving the user interface and experience of the website and its performance on technical, visual communication and usability levels. In January, we released a new homepage and redesigned the CORE services page. More about it can be found on the Jisc Research blog. Related Links: Read more.
CORE has just released a major update to its search engine, including a sleek new user interface and upgraded search functionality driven by the new CORE API V3.0. CORE Search is the engine that researchers, librarians, scholars and others turn to for open access research papers from around the world and for staying up to date on the latest scientific literature. CORE constantly evaluates feedback from users and integrates this feedback as a part of the ongoing roadmap for CORE's continued development. Working with our users and data providers to deliver a consistently improving user experience is a key component in CORE's ongoing success. All the details are described on the Jisc Research blog. Related Links: Read more.
We are proud to announce that the work in our EU-funded project ON-MERRIT that aims to analyse and deliver a set of evidence-based recommendations for science policies, indicators, and incentives, which could address and mitigate cumulative (dis)advantages in Open Science has been mentioned in a Nature news article. The work of the Open University, which is a partner in this project, focuses on the investigation of the role of Open Science in promotion and tenure policies, practices, and incentives within academia. At the OU, the project is led by the Big Scientific Data and Text Analytics Group (BSDTAg) with Dr. Petr Knoth (PI) supported by Dr. Nancy Pontika, David Pride, and Matteo Cancellieri. Enjoy reading the full article on the Nature blog. Related Links: Read more.
This post is our personal story of how members of the CORE team have been affected and caught up in the armed conflict in Ukraine. It is one of the many testaments to the implications of war and a plea to the Russian and Belarusian academic community to help stop this violence. Follow this link to read the full story on the CORE blog. Related Links: Read more.
CORE and Iris.ai are extremely pleased to announce the initiation of a new research collaboration funded by the Norwegian Research Council. Discovering scientific insights about a specific topic is challenging, particularly in an area like chemistry which is one of the top-five most published fields with over 11 million publications and 307,000 patents. The team at Iris.ai have spent the last 5 years building an award-winning AI engine for scientific text understanding. Their patented algorithms for identifying text similarity, extracting tabular data, and creating domain-specific entity representations mean they are world leaders in this domain. Follow this link and read the full blog post about this cooperation on the Jisc Research blog. Related Links: Read more.
Our society is facing significant challenges due to the widespread misinformation, in particular on social media, substantially influencing public opinion. As a result, we are seeing a lot of demand for innovative text processing methods to fact check and provide an automatic assessment of trustworthiness and credibility. Machine learning and natural language processing have started to be widely used to address this problem. While scientific papers have been traditionally seen as a source of mostly trustworthy information, their use within automated tools in the fight against misinformation, such as related to vaccine effectiveness or climate changes, has been rather limited. Read the full blog post about it on the Jisc Research Blog. Related Links: Read more.
We are always excited to announce new releases of tools that support Open Access and use the CORE services. This time there is a release from our friends at the Open Access Helper. This is a tool that helps everyone discover a legal Open Access version of research outputs around the web. What is new with this version is the application's ability to bring to researchers proactive notifications on their iPad and iPhone whenever they are browsing articles behind a paywall. We are really excited about this release because it is integrating our brand new CORE API (v3). Find out more about this integration on the Jisc Research blog. Related Links: Read more.
On Thursday 13th January 2022, Petr Knoth, Head of CORE and Matteo Cancellieri, Lead Developer, gave a webinar describing the new CORE APIv3 features. There were 72 attendees. In the first part, we introduced new features in the API, and the second part provided live coding examples followed by answering questions from the audience. Read about this webinar more on the Jisc Research blog. Related Links: Read more.
We're delighted to announce a new partnership between CORE and Cypris, a leading AI-driven, market intelligence platform that connects research & development (R&D) teams with innovation data and trends in their field. The partnership will provide Cypris with unlimited access to over 210 million open access articles to further enhance their platform and regularly add live market data to provide R&D teams with the most up-to-date research in their fields of interest. Continue reading this news on the Jisc Research Blog. Related Links: Read more.
The forest chatter has been clamorous since Microsoft's announcement to retire Microsoft Academic (MAG) at the end of 2021. Like many others, at CORE, we have used MAG for a number of tasks including data quality enhancement and enrichment, to obtain citation data, for our research in semantic typing of citations and to enrich MAG and Microsoft Academic Search by supplying direct links to full-text content (in a similar way we do for PubMed). Continue reading this blog post on the Jisc Research Blog. Related Links: Read more.
We're delighted to announce a new partnership between CORE and Arabic Digital Reform Institute (ADRI), providing services to researchers to store, share and access Arabic academia online. The partnership will provide ADRI with unlimited access to millions of open access articles to provide research platform and repository services to academics all over the world. The detailed information about this is available on the Jisc Research blog. Related Links: Read more.
Since the start (10 years ago!) CORE's mission has been to aggregate and facilitate access to Open Access scientific research at an unprecedented scale to both humans and machines. To achieve this aim, we are always refining and improving our methods for access and use of the CORE data. A key consideration in making improvements is that CORE users hail from many different backgrounds and are applying the CORE tools in a variety of use-cases. At last count, we had over 40 broad industry types (including academic research, education, publishing, software, and technology companies) applying the CORE tools to their work across the world. Applications of CORE tools and data are growing and constantly changing. Read the continuation of this story on the Jisc Research blog. Related Links: Read more.
It all started in 2010 when the then PhD student at the Knowledge Media Institute at the Open University, Dr. Petr Knoth wanted to collect a large corpus of academic papers to explore related research content. It was a frustrating job as he realised that there not only wasn't a readily available corpus of all research papers, but that collecting this information for machine processing was particularly difficult. While reading about Open Access, he came up with the idea to create a tool that harvests both metadata and full text from all research repositories on a global scale enabling unrestricted access to all content. Continue reading this birthday story on the Jisc Research Blog. Related Links: Read more.
Much of CORE Team's focus involves developing services that underpin open research. The updates for this half-year include numerous examples of this in action. You can find details about these and more news on the Jisc Research blog. Related Links: Read more.
The need for administering automated methods for evaluating research is gaining more attention lately. The primary motivation for this is to replace the regular, more exhausting exercises like peer-reviewing and the not so sophisticated, less accepted ways of ranking research works like Impact Factors, which solely depends on the citation-frequency. One such proposition is the utilisation of other citation aspects, such as function or importance, for redefining the current research evaluation decision frameworks, The recently concluded 3C citation context classification shared task organised by the researchers at Knowledge Media Institute (KMi), The Open University, UK and the Oak Ridge National Laboratory (ORNL), US and is one such effort aimed at providing a unifying platform for researchers in this domain, to push research further in this direction. Read the continuation on Jisc Research blog. Related Links: Read more.
Dr Anna De Liddo (Senior Research Fellow and lead of the IDea research team) has been invited to become President-elect of the SIG (Special Interest Group) on Cognitive Science Research in IS of the Association for Information Systems for 2021-2023. Anna will join Luca Iandoli (President-in-office and Associate Dean for Global Programs and Research at St. John's University, New York) and Jia Shen (Incumbent Presiden, Associate Professor and Chair Department of Information Systems at Rider University) in leading the SIG's activities for the next three years. The Association for Information Systems (AIS) is the premier professional association for individuals and organisations who lead the research, teaching, practice, and study of information systems worldwide. A SIG is a community that shares an advancement in a specific area of knowledge in IS; they allow members to connect with like-minded individuals to affect or produce solutions within a particular area of expertise. SIGs are dedicated to researching, developing and disseminating knowledge based on vast experiences of particular topics in the management and organisation of IS. The SIG on Cognitive Research (SIG CORE) is specifically intended to represent and support researchers in information systems who view understanding human cognition as a critical component to the successful design and implementation of information systems. As such, the questions of interest relevant to this SIG group focus on IS problems in terms of knowledge, technology-mediated reasoning, perception, and judgment. Becoming President of the SIG on Cognitive Research means Anna will help the organisation of some of the leading AIS conferences, such as AMCIS 2021 and its subsequent series in 2022 and 2023. In particular, Anna, Luca and Jia will chair a track at AMCIS2022 and organise a workshop on IS Cognitive Research at the International Conference on Information Systems (ICIS), which is the most prestigious gathering of information systems academics and research-oriented practitioners in the world. Joining SIG CORE and becoming an AIS member gives access to such annual conferences and workshops and Affiliated Conferences and other resources, such as libraries, journals, professional training, and community news. Overall this new role allows Anna to represent and support researchers in information systems who view understanding human cognition as a critical component to the successful design and implementation of information systems.
Flowcite has teamed up with CORE, the world's largest aggregator of open access research papers. The partnership will provide Flowcite users with free and unlimited access to millions of open access research papers from the CORE database. CORE is delighted to partner with Flowcite and progress our aligned goals to make open research content available to all. By connecting our innovative solutions we continue to evolve the way research is being completed and increase the discoverability and usage of all research outputs." Dr Petr Knoth, CORE Founder. "Expanding our Knowledge Library with CORE we provide our users with an opportunity to find relevant sources for their research in just one click. This integration doesn't simply improve our in-built Knowledge Library, but makes us closer to our global goal of creating one tool that takes care of everything, where you can read, annotate and cite unlimited papers simultaneously just like when working in a browser." – says Guillaume Grust, Founder of Flowcite. Read more about this collaboration on the Jisc Research blog. Related Links: Read more.
October to December 2020 CORE broke records, partnered with arXiv.org and continued improving our REF2021 compliance monitoring service. The CORE team had a busy end to 2020! Our team concentrated on multiple areas, including collaboration with the open access community and new feature development. Find out more details on Jisc Research Blog.
CORE provides access to freely available full text papers which were previously unavailable in PubMed to enhance the experience of its users. This is delivered via the LinkOut service. Read the CORE blog post to find out more about this integration. Continue reading on the Open Research blog.
KMi has joined the I4OA Stakeholders' Group, adding its support to their Open Abstract Initiative. This action, launched on 24 September, aims to advocate to all scholarly publishers to open the abstracts of their publications, and specifically to distribute them to trusted repositories where they are open and machine-accessible, in order to facilitate large-scale access and promote discovery of critical research. This is a joint collaboration between scholarly publishers, librarians, researchers, and infrastructure organisations and KMi is very proud to be part of this large group who are promoting Open Abstracts. Indeed, our Scholarly Knowledge Modelling Mining and Sense Making (SKM3) team, relies heavily on abstracts for their analyses and access to such information is important to achieving their results. Take, for example, the AIDA Dashboard which is one of the latest products developed by the SKM3 team, a tool for exploring and making sense of scientific conferences. This dashboard allows users to assess the research challenges that a conference is actually addressing, examine the trends of its relevant research topics and how its focus changed over time. To enable this functionality, all research papers published and presented to a conference were annotated using the CSO Classifier. This tool takes as input abstracts and titles and through Natural Language Processing and Semantic Web techniques identifies relevant topics drawn from the Computer Science Ontology. Abstracts have played a key role in the development of this innovative application. Currently, I4OA is supported by more than 60 publishers, including Cambridge University Press, MIT Press and others, which are committed to making their abstracts openly available by depositing them in Crossref. However, many major publishers like Elsevier, the American Chemical Society (ACS), the Institute of Electrical and Electronics Engineers (IEEE), Springer Nature and others are still hesitant. On the other hand, more than 60 stakeholders, including Bill and Melinda Gates Foundation, CORE, Center for Open Science, Harvard Library, UK Research and Innovation and many others, have expressed support for this initiative. KMi recognises there is a need for unrestricted availability of the abstracts of the world's scholarly publications and is thrilled to support this Open Abstracts Initiative. Related Links: Initiative Stakeholders AIDA Dashboard
KMI's CORE has reached a new milestone of 30 million monthly active users. This follows already significant growth of CORE's user base in 2020, as they only reported achieving 20 million monthly users in June 2020. Read the detailed blog post here. Related Links: Read the detailed blog post here.
KMI's CORE team continues to work on improving the CORE. This period was a highly productive period for CORE in terms of growing and developing our products. Find more in the blog post. Related Links: Find more in the blog post
The first edition of the shared task organised by the researchers at CORE, Knowledge Media Institute (KMi), The Open University, UK featured the classification of citations for research impact analysis. The new shared task, known as the 3C Citation Context Classification task, organised as part of the 8th International Workshop on Mining Scientific Publications (WOSP), 2020 and was hosted on the free data science competitions hosting platform, Kaggle InClass. Find more following the link. Related Links: Find more following the link.
The I4OA was launched this September, calling for an increase in the volume of open abstracts. After having identified that either a large number of the published literature does not have open abstracts, or that available abstracts are currently disseminated via proprietary platforms with reuse restrictions, I4OA calls the publishing community to open up all abstracts of the published literature. Find more in CORE's blog. Related Links: Find more in CORE's blog.
arXiv readers now have a faster way to find articles relevant to their interests. From an article abstract page, readers can simply activate the CORE Recommender to find additional open access research on similar topics. Read more at the arXiv.org blog. Related Links: Read more at the arXiv.org blog.
CORE's mission is to increase the discoverability of open access research and promote as widely as possible the content of our data providers, i.e., repositories, journals, and web resources. We currently collaborate with more than 10,000 data providers from around the world and are continuously looking for new ways to increase this number to offer an as complete as possible coverage of the world's open access content. More information about the new workflow for adding new data providers and gaining access to the CORE Repository Dashboard can be found at Jisc Scholarly Communications. Related Links: Read more.
Despite the global situation caused by the pandemic and the ongoing changes, the second quarter of 2020 has seen significant progress in the operation and development of CORE – new products have been released and the team reached new achievements. Follow the link and be informed about: 1. 20 million monthly CORE users and growth of CORE's worldwide rank 2. CORE Repository Dashboard and Repository Edition releases 3. CORE helps Lean Library to provide OA research papers 4. CORE Ambassadors' network and achievements 5. CORE Discovery and repositories 6. CORE team research accomplishments 7. CORE negotiations and partnerships 8. CORE Statistics. Related Links: Read more.
Due to unprecedented events following the global pandemic situation, this year, the 8th International Workshop on Mining Scientific Publications (WOSP), 2020 was fully organised virtually. The entire workshop constituted a single day, with four sessions, featuring keynote talks, with accepted paper presentations and a shared task on citation context classification. More details regarding the programme structure can be found here. The workshop this year was organised by CORE, The Open University, UK, in collaboration with Oak Ridge National Laboratory (ORNL), Tennessee, US. Find out more here. Related Links: Read more.
CORE is happy to announce the release of a new version of the CORE Repository Dashboard. The update will be of particular interest to UK repositories as we are releasing with it a new tool to support REF2021 open access compliance assessment. The tool was developed for repository managers and research administrators to improve the harvesting of their repository outputs and ensure their content is visible to the world. Full details here. Related Links: Read more.
Thousands of data providers from almost 150 countries from all over the world are connected with researchers, students, life long learners and the general public via CORE. This past month CORE's monthly users reached 20 million - we are really proud of it and grateful to all our content providers. Read more about this here. Related Links: Read more.
Members of the CORE Team have been working on submissions for the Joint Conference on Digital Libraries (JCDL) and today we are extremely happy to inform our readers that our two teams have both received acceptance notices. Doctors Bikash Gyawali, Dr. Nancy Pontika and Dr. Petr Knoth have been working on "Open Access 2007-2017: Country and University Level Perspective" while David Pride and Dr. Petr Knoth worked on another submission entitled, "An Authoritative Approach to Citation Classification". Follow the link and find out more details about this. Related Links: Read more.
CORE follows its mission and makes open access more visible and reusable by being an enabling infrastructure. This time CORE joins its forces with Lean Library, whose aim is to provide seamless access to research materials for users. Due to this collaboration with Lean Library, the CORE Discovery service will now be indirectly used by library systems integrating Lean Library, thereby reaching more users. More information about this integration can be found here. Related Links: Read more.
As part of the International Workshop on Mining Scientific Publications, WOSP 2020 (https://wosp.core.ac.uk/jcdl2020/index.html), researchers at CORE are organizing a new shared task: the '3C' Citation Context Classification Task. The aim of this shared task is to classify the citation context in research publications based on their influence and purpose. There will be two subtasks associated with this shared task and these tasks will be hosted on Kaggle as separate competitions. Subtask A (https://www.kaggle.com/c/3c-shared-task-purpose) is a multi-class classification task, where the citations are categorized as six different classes based on the purpose. The second subtask B (https://www.kaggle.com/c/3c-shared-task-influence) is a binary classification task, based on the citation influence. More information can be found in this blogpost. Related Links: Read more.
At the end of March, CORE presented a webinar (slides and recording) on how UK HEIs can track compliance with the REF2021 open access policy. The webinar was fully booked and attended by 131 repository managers and research administrators from the UK Council of Research Repositories (UKCoRR) and the Association of Research Managers and Administrators (ARMA) groups. During the webinar the CORE Team presented the CORE Repository Dashboard, a tool specifically designed for research support staff, which contains functionalities that provide useful information about tracking compliance with the REF2021 open access policy. Some of the topics that were discussed during the webinar are: Deposit compliance Deposit time lag Publication dates RIOXX metadata Read the full blog post to learn more, and access the slides and the webinar recording. Related Links: Read more.
Every day, millions of people access free OU content. Congratulations to all members, past and present, of our CORE and iTunes U teams for being recognised as two of the five Open Access sources mentioned on the OU's Mission Page. CORE is the world's largest collection of open access research papers delivered in partnership by the Open University and Jisc. CORE hosts over 19 million Open Access full text papers and allows searching and accessing over 170 million research papers. KMi were instrumental in launching the ITunes U podcasts, in partnership with the Open University Learning and Teaching Solutions division. KMi are extremely proud to see these achievements acknowledged; the teams' hard work over many years has paid off, benefiting our students, and exemplifying the OU's core values. Related Links: http://www.open.ac.uk/about/main/strategy-and-policies/mission
CORE is extremely happy to keep its reader up to date and here is its quarterly report for January to March 2020 period. Read CORE Blog Post which includes: CORE is ready to release a premium version of the Repository Dashboard CORE's products are used by Open Access Helper CORE is continuously expanding its ambassadors' network CORE step by step guides CORE as an enabling infrastructure CORE Statistics Related Links: Read more.
Last Tuesday, March 3, we were privileged at CORE to welcome a leading figure in the quest for Open Access to scientific knowledge. Carl Malamud and Petr Knoth had a very productive discussion of their work, common goals and shared their experience. What is more, Carl Malamud has given a talk at KMi on text and data mining in scientific journals. For more information about his talk, read here. Related Links: Read more.
Read about our work on going beyond mirroring content from our data providers to improve data quality. In our latest blog post, we present how we link CORE data to complementary scholarly sources and databases including Crossref, MAG, and ORCID. Related Links: Read more.
Claus Wolf, with CORE's support, has developed the OA Helper - a brand new application, which enables iOS users to search for scientific articles in their devices without hitting a paywall. Fascinated by Open Access and Open Source, Claus Wolf implemented CORE Discovery and CORE Recommender into this application. Claus Wolf says: "Open Access provides a level playing field on which innovation can be built and also serves as a field for learning. Creating a tool that would support Open Access for macOS & iOS users thus seemed like a worthwhile endeavour and it turned out to be a great learning opportunity for him." To install this application on your device, just visit the Apple Store site. Related Links: Read more.
CORE supports Plans S in granting open access to scholarly outputs (such as publications) to anyone without any barriers and restrictions, including to most forms of use and re-use by humans and machines. PlanS is an initiative supported by the European Commission and various national public funding bodies ("cOAlitionS") who, from 2020, will require that all articles by their grantees must be published immediately OA. Plan S is developed by Science Europe - the European association representing the interests of major public research performing and research funding organisations. Read more about this at the CORE Blog. Related Links: Read more.
At the 25th Anniversary KMi Festival we invited staff from across the OU campus to come and find out how our latest knowledge and media technologies are impacting education, science, and cities. The Festival attendees included Lady Kitty Chisholm, one of the three founders of KMi, the STEM Executive Dean, Nick Braithwaite, and the new CFO Paul Traynor. Visitors had in-depth conversations with the research teams and tried out some hands-on exhibits. Among the fun demonstrations were the Knowledge Makers' Memory Game, and the Citizens Science Team's Biodiversity Quiz, which invited visitors to test their powers of observation by comparing photographs of bees and butterflies which have been uploaded by the general public with the species catalogue, to identify the correct variety of species. There was also a demonstration of the Open Field Lab kit which is used to broadcast live student fieldcasts through the OU's Stadium Live system. Other stands were showing: Our Social Media Analytics work on online misogyny was recently featured in WIRED Our learning analytics platform (OU Analyse) which has been shortlisted for a Times Higher Education Award. The OU Analyse Team are OU REsearch Excellence 2019 winners. Our scholarly data platform which CORE now attracts more web traffic than the OU or FutureLearn websites. The CORE team are aslo OU Research Excellence Awards 2019 winners. Our collaboration with Springer Nature on scholarly analytics has produced a new standard classification scheme for Computer Science as well as a solution for automatic metadata generation, which is now in routine use at Springer Nature. Our blockchain based accreditation work underpins the £20M funded Institute of Coding which was highlighted at the OU's TEDx event. Our award-winning MK Data Hub which was recently applied to power the world's first Smart City Robot competition in Central Milton Keynes. Back in 1994, the KMi founders had a vision of what the future of knowledge and media would be like and KMi was created to implement such a vision. During this 25 years KMi has established itself as a world-class research centre and the KMiFest was a great way to come together to celebrate the impact that KMi's intensive research has delivered.
CORE was presented with the Outstanding Impact of Research on Society and Prosperity Award at The Open University's Research Excellence Awards Ceremony 2019 which took place at the MK Stadium on October 23rd. Over 150 Researchers, academics and support staff attended with Professor Kevin Hetheringthon, Pro Vice Chancellor for Research, Enterprise and Scholarship hosting. The new Vice Chancellor of the Open University, Professor Tim Blackman also attended and welcomed the guests at the beginning of proceedings. The awards were presented by Professor Monica Grady and we were extremely lucky to also be joined by Professor Dame Jocelyn Bell Burnell who presented the Open University's 50th Anniversary Prize for Research. Representing CORE were Dr. Petr Knoth, Matteo Cancellieri, Lucas Anastasiou, Bikash Gyawali, David Pride and Alan Fletcher. Balviar Notay of Jisc, a key partner of The Open University in delivering CORE also joined the awards ceremony. Dr. Petr Knoth commented: "I am delighted to win this award and would like to thank everyone who contributed to the development of CORE since its start in 2010, including our key partner Jisc as well as other funders, our users and commercial customers and our fantastic and talented current and former staff. It has been a great pleasure for me to be able to lead this fantastic team that is so passionate about the mission of opening up research knowledge to all, including not just researchers, funders and librarians, but also the millions of our users from within the general public who can discover and access the results of research they have contributed to by paying their taxes. It has been a privilege for me to be able to run this project from within the Open University which fully embraces the mission of openness, lifelong education and knowledge sharing. I hope the work of my team, a service that has become an essential part of the open access infrastructure, will contribute to making the Open University a centre of excellence for Open Research in the future."
CORE has just made it to the top 5,000 websites globally accroding to Alexa Global Rank, which is calculated from a combination of daily visitors and page views on a website over a 3 month period. As of today, CORE ranks at 4,924, climbing 871 places over the last 90 days. The improvement is impressive considerring that academic websites typically experience a seasonal slowdown during the summer break. As a result, the rank is likely to improve even further. For instance, these are the ranks of the Open University (10,605), British Library (17,925), Directory of Open Access journals (24,255) and OpenAIRE (508,394). These are very strong data, showing the amazing value for money of CORE to the society.
Success in research and innovation should primarily build and depend on clarity of thought, innovation of ideas, and integrity of processes, rather than on external factors like prior reputation or levels of resources. Open Science and Responsible Research and Innovation aim to bring equity and inclusivity to research. Yet could policy interventions in these directions actually worsen existing inequalities? ON-MERRIT studies „Matthew effects" of cumulative advantage in Open Science and Responsible Research and Innovation across research, industry and policy-making, through a mix of sociological, bibliometric and computational approaches. Where we discover such effects at play, we will make policy-recommendations to mitigate or negate these effects. Dr. Petr Knoth explains: "ON-MERRIT will help us to better understand the flaws of the axiomatically established indicator-based incentives system that is currently deeply driving academic practice. This understanding will enable us to apply data-driven approaches to seek new counter-measures that reward based on merit and incentivise good research practices, such as reproducibility, transparent research workflows, and open research data and software sharing. The project will build on the expertise acquired by the Big Scientific Data and Text Analytics (BSDTAG) group in the area of open science and big data analytics which has been developed through a series of KMi projects supporting the CORE (core.ac.uk) service over the last 8 years." ON-MERRIT will be launched in October 2019 and runs until March 2022, with total funding of 1 million Euros from the EC's Horizon 2020 programme. Partner Consortium KNOW-CENTER GMBH - Research Center For Data-Driven Business & Big Data Analytics, Graz (AT) TU Graz - Institute of Interactive Systems and Data Science, Graz (AT) THE OPEN UNIVERSITY - Knowledge Media Institute (UK) UNIVERSIDADE DO MINHO - Minho (PT) GEORG-AUGUST-UNIVERSITÄT GÖTTINGEN - Göttingen State and University Library (DE) Related Links: Original Press Release
CORE participated at the Open Repositories conference (10 - 13 June 2019), which took place in Hamburg, Germany. This year's conference theme was "All the user needs", where CORE received much attention and participated actively with 5 presentations: Assessing compliance with the UK REF2021 Open Access Policy Comparing the performance of OAI-PMH with ResourceSync CORE Analytics Dashboard Analysing the performance of open access papers discovery tools The future of scholarly communications professionals Read more… Related Links: CORE's conference presentations
CORE a global aggregator of open access content and UK's national aggregator will be assisting the UK Research and Innovation's audit for the REF2021 by supplying the deposit date information for all UK REF outputs. Compliance with the REF2021 Open Access Policy is established when an author deposits the post-print or Author's Accepted Manuscript in a repository, institutional or subject, within 90 days from its acceptance. The REF audit committee will consult CORE for discovering the deposit date and decide whether an output is compliant with the policy or not. The Research England's REF Audit Guidance specifically states that: "40. We will undertake verification of the dates that outputs became publicly available, particularly where they were published early in the REF period or are marked as 'pending' publication (for example, by obtaining a letter from the publisher). This will include checking the publication year against the Crossref4 database and against Jisc CORE" …. "46. We will assess each HEIs' overall compliance with the REF 2021 open access policy by: … iv. Using Jisc CORE, comparing the datePublished and depositedDate and identifying where the number of days between the two dates is greater than 92." .... "49. Where there is insufficient evidence to demonstrate a robust and well-managed process for open access, we will identify a set of outputs from each submission made by the HEI, and request further information to verify whether they are compliant with the policy, or whether an exception applies. Outputs may be selected randomly, or based on information in unpaywall.org or Jisc CORE, or a combination of the two. We will select outputs that have been returned as compliant with the policy, and/or outputs that have been returned with exceptions." For more information visit the "Audit Guidance" here. Related Links: Audit Guidance
KMi researchers Dasha Herrmannova, Nancy Pontika and Petr Knoth win Vannevar Bush Best Paper Award at the Joint Conference on Digital Libraries (JCDL 2019) for their paper titled: "Do Authors Deposit on Time? Tracking Open Access Policy Compliance". JCDL 2019 is an A* conference (highest rank) and the world's highest ranking venue for digital libraries research, within the top 4% of all computer science conferences according to the Computing Research & Education Conference Portal. JCDL has taken place this year at the University of Illinois at Urbana-Champaign, United States. The paper, which uses data from CORE (core.ac.uk) to quantify the growing traction of open access has received media attention even prior to the presentation at JCDL 2019 and was featured in an article in Physics Today. Dasha Herrmannova says: "I am utterly astonished and still can't get over the fact that we won the best paper award last night at this amazing conference." Petr Knoth says: "It has been a pleasure to work with this amazing team. We went through many revisions on this paper as the work turned out to be more complicated than we originally anticipated, but it paid off." What is even more impressive is that this is the second Best Paper Award in a 12-month period at key digital libraries conferences for the Big Scientific Data and Text Analytics Group (BSDTAG). David Pride and Petr Knoth won the Best Paper Award at the best European DL conference - TPDL in Porto, Portugal in September 2018 for their paper: "Peer review and citation data in predicting university rankings, a large-scale analysis". Related Links: Study quantifies the growing traction of open access Do Authors Deposit on Time? Tracking Open Access Policy Compliance
CORE, the world's largest aggregator of open access scientific content and Turnitin, a global leader in plagiarism detection software, have entered into a collaboration. Using CORE's FastSync service, Turnitin's proprietary web crawler will search through CORE's vast global database of open access content and metadata—135 million metadata records from over 3,700 data providers and counting—to check for text similarity. "As the scholarly publishing industry evolves, Turnitin's services must similarly adapt," said Valerie Schreiner, Turnitin SVP Business and Corporate Development. "This partnership with CORE ensures that our database remains at the forefront of publishing trends and can continue to best serve the needs of our customers and partners." Access the Press Release here. Related Links: Press Release
A Nature article titled "Data sharing and how it can benefit your scientific career", which discusses the importance of opening up research data and how data sharing could benefit researchers, mentions the FOSTER Open Science eLearning Portal, developed in KMI. The EU funded "Facilitate Open Science Training for European Research" (FOSTER) and its continuation project "Fostering the practical implementation of Open Science in Horizon 2020 and beyond" (FOSTER Plus) have established the FOSTER eLearning portal, which has been implemented in KMI. FOSTER offers more than 1,300 training resources, 45 courses (offered either self-paced or moderated) and five learning paths leading to specialisations in Open Science. KMI's Big Scientific Data and Text Analytics Group (BSDTAG) has participated in the two FOSTER projects. KMi fully developed and hosts the training technology and has also contributed to the creation of the training content and courses. Related Links: FOSTER portal
A study by Drahomira Herrmannova, Nancy Pontika, and Petr Knoth of KMI has been featured in Physics Today, the flagship publication of the American Institute of Physics (AIP). The study evaluated the time it took for academics to deposit some 800,000 papers in repositories in relation to when these papers got published. The bibliometric data for the study came from KMI's CORE. As the Physics Today article noted, the study found that while the time to deposit has been decreasing globally, the change has been particularly pronounced in the UK. In fact, since 2016, UK-based scientists have been posting their papers online more quickly than those in the other four nations with the highest number of papers in the dataset: the US, the Netherlands, Italy, and Switzerland. The REF 2021 Open Access Policy, which requires depositing papers within three months of their acceptance date, may have accelerated this trend in the UK. According to the authors, the key message of the paper is that this observation supports the argument for the inclusion of a strictly time-limited deposit requirement in OA policies. The study has also found significant differences between deposit practices at different universities, suggesting that institutions play an important role in supporting Open Access. The study will be presented at the ACM/IEEE Joint Conference on Digital Libraries in Urbana-Champaign, IL, in June. The code and the dataset used in the study are available online. Related Links: Physics Today article The study
We are very excited to announce that CORE has released a new front-end marking the end of Phase 1 of front-end improvements, which will continue with 2 more phases. The key highlights of the new UI are: A more modern yet functional look and feel. Support for mobile devices. A new and better presentation of CORE's mission and services. Cross-browser support covering over 95% of CORE's users. Accessibility improvements. Removal of single point of failure dependencies, taking full advantage of CORE's high availability infrastructure. But what is an end to one thing is a start to another. The objectives of Phase 2 are now: Taking CORE's search experience to a new level. New functionalities for the online CORE Reader Improvement to some existing static pages Special thanks here to the everyone involved in this release: Viktor Yakubiv, Tom Davey, Matteo Cancellieri, Balviar Notay, Samuel Pearce, Sergei Misak, Svetlana Rumyanceva, Nancy Pontika and Petr Knoth. Related Links: CORE - The worlds largest collection of open access research papers
CORE usage has increased dramatically in 2018 and has hit the 10 million monthly active users mark in January 2019 (10.41 million users). This is a 571% increase of users compared to January 2018. As of January 2019, CORE was the 5,448th most used website globally according to an independent Alexa Rank. This rank is calculated from a combination of daily visitors and page views on a website over a 3 month period. To put this into perspective, at the time of writing this document, the rank indicates that CORE has significantly more users than Futurelearn (rank 6,083), The Open University (rank 8,849), British Library (rank 12,702), Jisc (rank 75,663) and many other significant organisations. Related Links: Alexa Global Rank for CORE
CORE, the world's largest aggregator of open access scientific content, and Naver, South Korea's number one search solution, have entered into a collaboration that will see CORE's content being made available to 42 millions Naver users. As part of the collaboration, Naver ingests data collected by CORE to enrich its Naver Academic search system with millions of open access papers. The aim of both services is to provide free access to scientific publications and make the experience seamless. Read more Related Links: Read more on the Jisc website
The Knowledge Makers recently took over the Berrill Theatre and Mezzanine for their fifth, and rather special event. Over 100 attendees from across all faculties took part on the day and joining them were two guests from The Raspberry Pi Foundation; Philip Colligan, CEO and Dr. Sue Sentance, Chief Learning Officer. The event kicked off with a 'Raspberry Research' showcase where OU researchers displayed their current research or teaching projects that are using Raspberry Pi. Eleven teams showed off incredible variety of use-cases for the single board computer. KMi demonstrated a strong presence with their OpenBlockchain and GreenData project teams, ably represented by Michelle Bachler and Chris Valentine, attracting a good deal of attention from the attendees. The KMi SciRoc team brought along some of their recent developments in working towards bringing robots to smart cities. Other researchers from across the OU were also in attendance, notably from the OpenSTEM labs who showed off their incredible Mars Rover, the MAZIZONE team who brought a range of interactive and engaging displays. Teams from STEM also took part with projects from healthcare (STRETCH) to networking, with the 'Network in a Box' being used to teach networking concepts via the OU Cisco Academy Following the showcase, Dr. Petr Knoth opened the keynote session with the results of a new investigation showing how Raspberry Pi is being used in research globally. The data that informed this research was drawn from the full-text articles held in the Core dataset. Excitingly, Core recently became the world's largest legal repository of full-text scientific articles. An engaging keynote by Philip Colligan about The Raspberry Pi and the foundation rounded the day off after which he was presented with a framed 'Raspberry Research word cloud' built using data from the Core research project. Overall, the event was a huge success. New partnerships and friendships were formed and a great time was had by all. The Knowledge Makers will be back in December for 'A Very Maker Christmas' which will be taking place in the Library at Walton Hall (date tbc) Visit http://knowledgemakers.kmi.open.ac.uk to see details of this and all the other Knowledge Makers events and workshops.
CORE has received a mention in a Nature article titled: "How AI technology can tame the scientific literature." The article discusses how Artificial Intelligence (AI) assists researchers, and in general those who are in need of scientific information, with discovering new knowledge from the vast amounts of available scientific literature. It is estimated that up to two research papers are being published within one minute, making it difficult for everyone to retrieve, read and digest all this content. As a result, new services that use machine learning, natural language processing, and algorithms are emerging. CORE has been mentioned in this context due to its collaboration just with Iris.ai, a literature-exploration tool powered by artificial intelligence, that is fully reliant on data supplied by CORE through its API. CORE provides a number of data services and is capable of offering enterprise machine access to a large corpus of research papers using a newly developed service called CORE FastSync. Related Links: How AI technology can tame the scientific literature
The best paper award at the 22nd International Conference on Theory and Practice of Digital Libraries (TPDL 2018) went to the paper authored by David Pride and Petr Knoth titled "Peer review and citation data in predicting university rankings, a large-scale analysis." The paper conducted the largest analysis of REF2014 data so far (data of 145 thousand submitted papers, 7 million citations across all 36 REF Units of Assessment/disciplines), looking at the link between peer review, conducted by REF nominated panels, and bibliometric indicators. The study found surprisingly high correlations of the REF results at an institutional level (Grade Points Average - GPA) with simple bibliometric indicators. This indicates that 2014 REF results could have been predicted using automated techniques to a high degree of accuracy for about a third of the disciplines, those with high average citations per paper. If such approach was adopted for just those disciplines, this could result in savings to UK universities and Research England of about £50 million every time a national exercise is run and even more if more disciplines adopted a similar approach. Since the preprint of this study was made available, a number of researchers have made contact with us and confirmed that they have since obtained similar results. This information is now being discussed with Jisc, who finance the project, to advise Research England on the next steps. TPDL 2018 is the highest regarded conference in the area of digital libraries in Europe and 2nd worldwide. TPDL 2018 took place in Porto, Portugal. Pride, D. and Knoth, P. (2018) Peer review and citation data in predicting university rankings, a large-scale analysis, Theory and Practice of Digital Libraries (TPDL) 2018, Porto, Portugal Lecture Notes in Computer Science, Springer, https://arxiv.org/abs/1805.08529
On the 30th May, the Knowledge Makers organised the first 3D printing workshop to take place at KMi. 23 people from all OU faculties attended and got hands on with an introduction to the software tools used to design 3D objects and also got to see some 'live' printing and a range of finished examples. The attendees were given a brief overview of OpenSCAD, an open-source 3D design tool, and were then tasked with designing an object of their choice in just 90 minutes. The results were truly amazing, clearly demonstrating what happens when you combine engaged and enthusiastic participants with powerful tools. Some wonderful designs were realised, including chairs, dice, wheels, ladders (albeit very small ones) - and even Tower Bridge! Outstanding design of the day however went to team 'Piggy Bank' who worked flawlessly together, each member producing one section of the final model. See the photos below for the impressive end results. The session gave the opportunity to bring people together from across the OU and introduce them to a new skillset. Additionally, attendees were introduced to some of the amazing facilities available on campus including the FabLab and Rapid Prototyping Lab. There is a wealth of talented people and fantastic resources here at the OU, we firmly believe events like this one can help to bring the two together. About the Knowledge Makers We are a growing group of enthusiastic makers, hackers and tinkerers who hold bi-monthly meetups at the Walton Hall Campus. We actively encourage makers and crafters of ALL varieties to get involved. It does not matter what your making passion happens to be, we believe sharing your passion is what makes a difference. Related Links: Knowledge Makers on the web Knowlege Makers on GitHub Twitter
A KMi research student Drahomira Herrmannova (Dasha) has successfully defended her PhD thesis titled: "Mining Scholarly Publications for Research Evaluation." While current research metrics evaluate the excellence of a publication based on the number of interactions in the scholarly network, such as the number of times it has been cited (Bibliometrics) or downloaded (Altmerics), this thesis explores the use of publications' full texts in research evaluation. The thesis first investigates what research quality is and then defines a new class of research evalution metrics called Semantometrics and its first metric called contribution. It then demonstrates, on a newly created True Impact Dataset, that the contribution metric can be more effective in identifying key research than existing research evaluation metrics. Dasha's examiners were Prof Enrico Motta and Dr Iana Atanassova of University of Franche-Comté. Dasha's supervisors were Dr Petr Knoth and Prof Zdenek Zdrahal. Dasha will continue her research in this area at the Oak Ridge National Laboratory in the United States. All the best Dasha!
The CORE service is working in partnership with ProQuest to deliver more content within their library discovery services (Ex Libris Primo and Ex Libris Summon). What does this mean for the end user? This means that search results will bring back more relevant content from OA repositories worldwide in addition to the existing library collection records. The user will not have to go to a separate search interface to run the same search query. Read more...
A new blog launched yesterday by Jisc focuses on their Open Metrics project which aims to support the development of new research metrics. Following the publication of The Metric Tide report in 2015 there is increasing awareness within the sector of a need for new research evaluation metrics that move beyond the limitations of traditional citation-based metrics. The launch included a piece that introduces David Pride, a PhD. research student at KMi. David's current research looks at the large scale evaluation of research articles using the publications' full-text. You can read the post here: https://openmetrics.jiscinvolve.org/wp/2017/11/citations-created-equal/ And the Open Metrics blog can be read here: https://openmetrics.jiscinvolve.org/wp/about/
EIFL's invitation to KMi's CORE project to take part in a workshop for researchers from developing countries pays dividends for participants and for CORE. In June 2017, EIFL invited the global open access full text aggregator CORE to take part in an Open Science train-the-trainer course for universities and research institutions in EIFL partner countries. Read more on EIFL's post and check CORE's blog to watch the videos of the workshop participants talking about CORE. Related Links: EIFL blog post CORE blog post
Laworm, an aggregator of scientific online tools addressed mainly to scientists, has listed CORE as a top tool and resource, which helps science to become open and collaborative. Related Links: Tools and Resources to make Science Open and Collaborative
During 25 – 27 October OpenMinTeD participated in the FORCE2017 Research Communication and e-Scholarship conference that brings together a diverse group of people interested in changing the way in which scholarly and scientific information is communicated and shared. On Friday October 27th the OpenMinTeD partners held two workshops, one on "How to improve interoperability across publisher platforms to support text and data mining" and another one on "Enhancing the real impact of scholarly publications through text and data mining". At the first workshop the Open University partners from the CORE project presented on the work they have done on the Publisher Connector. This involved surveying the publishers on their machine accessibility interfaces of accessing Open Access content, the creation of the Publisher Connector, a tool that harvests Open Access content from publisher systems and exposes them via the ResourceSync protocol, and the technical expertise directory, where documentation is provided on how harvesting from publisher platforms can be achieved. Read more... Related Links: Original blog post
1. DOES YOUR ORGANIZATION HAVE AN OPEN ACCESS STRATEGY? AND HOW ARE YOU IMPLEMENTING IT? CORE is an Open University (OU) project and is jointly funded by the OU and Jisc. CORE is a global full text aggregator of Open Access content harvesting repositories, institutional and disciplinary, and Open Access and Hybrid Journals. Today, the CORE team at the OU runs the CORE service, which is the world's largest aggregator of open access research publications, from repositories and journals systems at a full text level. Currently CORE harvests more than 3700 repositories, 6000 journals and has 80 million metadata records and almost 8.5 million full text. Our mission is to aggregate all Open Access research outputs and make them available to the public. We support the citizens' right to have access to information and we have established a wide set of services for that purpose. All our services are free of cost to the end user and enable them to gain access to Open Access content both in a human and machine readable form and develop their own applications using our content. Read more Related Links: Original blog post
Since 2012, members of KMi's CORE team, headed by Petr Knoth, have orchestrated the WOSP (Workshop On mining Scientific Publications) held each year as a part of JCDL (Joint Conference on Digital Libraries. Previously held in locations as diverse as London and Indianapolis, this year the 6th annual international WOSP workshop took place at the University of Toronto. Over 100 academics joined us to hear presentations from 14 authors on a wide range of topics, from using machine learning to detect academic plagiarism, to using text and data-mining to interrogate a bilingual scientific repository. Our very own Petr Knoth presented new research on Recommender Systems, you can see the slides from this presentation HERE. We also had several demonstrations; Victor Botev gave us a really nice overview of their 'Iris.ai – the Science Assistant' project whilst Ron Daniel from Elsevier presented the 'Content Analytics Toolbench (CAT)' Jevin West Waleed Ammar We had fantastic keynote speeches, from Jevin West (Assistant Professor at the Information School at the University of Washington and co-director of the DataLab ) who introduced us to VizioMetrix, a platform that extracts visual information from the scientific literature. Our second keynote was Waleed Ammar, research team lead at Semantic Scholar, who spoke about their latest work around citation extraction and recommender systems. Researchers can check out their Cite-o-Matic recommender HERE. Many thanks to all our speakers, authors and attendees, hopefully we'll see many of you next year for WOSP 2018! Related Links: Full workshop details here
FIT4RRI is precisely intended to contribute to bridging the gap between RRI and Open Science and promoting viable strategies to render institutional changes in RFPOs (Research Funding and Performing Organizations) FIT4RRI moves from the assumption that there is a serious gap between the potential role RRI and OS (open science) could play in helping RFPOs (Research Funding and Performing Organizations) to manage the rapid transformation processes affecting science (especially the science-in-society aspects) and the actual impact RRI and OS are having on RFPOs, research sectors and national research systems. The project will act on 2 key factors: Enhancing competencies and skills related to RRI and OS through an improvement of the RRI and OS training offer Institutionally embedding RRI/OS practices and approaches by promoting the diffusion of more advanced governance settings 'Through FIT4RRI we want to engage hard scientists into responsibility matters and promote RRI and OS as drivers for institutional change in research funding and performing organizations. We look at science as a tool to create bridges towards society' Andrea Riccio, Project Coordinator UNIROMA1 CORE has two contributions in this project; it will create the platform to host the RRI resources, training tools and events and will also run an RRI experiment with a focus on the Text and Data Mining. The FIT4RRI project was granted within the Horizon 2020 Program of the European Union after a competitive one stage selection process. The project started with a kick-off meeting in Rome at the Sapienza University on the 12th & 13th of June 2017 and will be funded for three years. Related Links: http://fit4rri.eu/
An online editing and proofreading company, Scribendi, has recently put together a list of top 21 freely available online databases. It is a pleasure to see CORE listed as Number 1 resource in this list. CORE has been included in this list thanks to its large volume of open access and free of cost content, offering 66 million of bibliographic metadata records and 5 million of full-text research outputs. Our content originates from open access journals and repositories, both institutional and disciplinary, and can be accessed via our search engine. In addition, we also offer an API and Datasets for programmable access to this content, enabling the development of new artificial intelligence-based applications for scientists and for carrying out text and data mining of scientific literature. Related Links: The Top 21 Free Online Journal and Research Databases
CORE, a harvesting service that aggregates open access content from open access journals and repositories from all over the world, currently provides 5 millions of open access full-text papers. "In the last year, we have managed to scale up our harvesting process. This enabled us to significantly increase the amount of open access content we can offer to our users. With more and more open access content being made available by data providers, thanks to recent open access policies, CORE now also captures and provides access to a higher percentage of global research literature ", says CORE's founder, Dr Petr Knoth. With 66 million metadata records and 5 million full-text, from 102 countries, in 52 different languages, CORE becomes now the world's largest full-text open access aggregator. CORE embraces the vibrant collections of both institutional and disciplinary repositories, while its large volume of scholarly outputs ranges from scientific research papers, to grey literature and from Master's to Doctoral thesis. In addition, it is a metasearch for the all the open access peer-reviewed scientific journal articles published in open access journals. CORE's open access collection can be accessed from our search engine (https://core.ac.uk). For those interested in using our data for other purposes, such as building services or applying text and data mining practices, we offer all the data for free via an API and a Dataset. Related Links: CORE
The past month the French Association of Directors and Officers of University Libraries and Documentation (ADBU) released a report entitled "Text and Data Mining in Higher Education and Public Research", which mainly explores the UK and French copyright exceptions for text and data mining (TDM). In more detail, the report lists the benefits of text and data mining in scientific research, defines the primary threats in the adoption and practice of TDM, i.e. legal and technical, presents the need for the development of a technical infrastructure, and demonstrates the motivation barriers and the necessary developments in the field. In an effort to understand the level of the TDM adoption and the lack of thereof, the report presents various case studies, one of which is the CORE project. CORE, an aggregation service currently holding around 4.5 million of full-text and 66 million metadata records, has been providing infrastructure for TDM via its main services, namely the CORE API and the CORE Datasets. As the report puts it: "Text-mining at scale cannot take place without infrastructure. Investment is needed in the technologies used to aggregate, normalise, interrogate and preserve TDM materials". CORE's services offer open access content and are provided to everyone free of cost. In addition, CORE is participating at the EU-funded project OpenMinTeD, which aims to create a TDM infrastructure, focusing on legal, technical, policy and interoperability issues, while its role is to act as an open access scientific content provider. Additional to the technical challenges, there are also legal requirements that are creating obstacles and limit the incentives to TDM. Even though there have been amendments both in the UK and the French copyright law, there are still gray areas that prohibit the application of TDM practices among researchers. Furthermore, the legal framework is not harmonised in all countries, while in some of them it does not even exist. The report states that "changes to copyright law must be accompanied by improvements in access, infrastructure, skills and incentives for TDM". In that context, and while CORE is already technically participating in the promotion of TDM, it welcomes all efforts for the advancement of TDM and is open to provide assistantship with the development of new and improvement of existing policies based on its own TDM experience. Related Links: Access the full report
On Thursday November 17th at the Town Meeting, the whole of KMi celebrated Zdenek's 25 years' working at the Open University, within KMi. Petr Knoth led a tribute from Zdenek's team, after which we commemorated the occasion with epicurean style, including a personalised giant cookie and an edible pie-chart made by the Analyse team! We reflected that Zdenek has done some great work at the Open University and 25 years marks an incredible achievement. Throughout his time here, Zdenek has contributed to the fields of Artificial Intelligence, Case Based Reasoning, Design, Knowledge Sharing, Machine Learning and Predictive Modelling. KMi is sure that Zdenek will continue doing great work (hopefully for another 25 years) at the Open University!
Drahomira Herrmannova and Petr Knoth have won the Best Poster Award at JCDL 2016 in Newark, USA with their contribution "Semantometrics: Towards fulltext-based research evaluation." This was a very good timing as the full experimental report on semantometrics commissioned by Jisc was published in this announcement a week prior the conference. The CORE team at KMi have also organised a successful 5th International Workshop on Mining Scientific Publications (WOSP 2016). The workshop was attended by key people in the area of text and data mining research papers from both Europe and the USA. The workshop featured this year two keynotes. Yuxiao Dong of Notre Dame University gave a talk titled "AMiner: Towards Understanding Big Scholarly Data" and Michael J. Kurtz of Harvard-Smithsonian Centre for Astrophysics presented the "Astrophysics Data System: The Joy of Text". At the workshop, Drahomira Herrmannova also presented a joint long paper with Petr Knoth titled: "An Analysis of the Microsoft Academic Graph." The WOSP workshop was this year sponsored by the OpenMinTeD project in which KMi participates and we invited two speakers on this. Stelios Piperidis of Athena Research Centre gave a talk on "Making sense of scientific textual content" and Peter Mutschke of GESIS presented a discussed in his talk the "Challenges and potential of text mining in scholarly information retrieval."
Last week, the CORE team attended the 11th Annual Conference on Open Repositories, an international conference addressed mainly to subject and institutional repository managers, focusing on open access, open data and open science tools, projects and services. At the conference the team had six submissions: 1. A workshop presentation on "How can repositories support the text-mining of their content and why?" where Nancy Pontika explained the how repository managers should be supportive of text-mining practices and Petr Knoth described the technical requirements that can enable the text mining of repositories. In addition to that, the CORE team was the workshop organiser, as part of its involvement with the OpenMinTeD project, an EU-funded project on text and data mining. The workshop has been described in two blog posts, one hosted at the OpenMinTeD blog (which includes all workshop presentations), and another post composed by Rebecca Sutton Koeser, a workshop participant. 2. A full presentation on "Exploring Semantometrics: full text-based research evaluation for open repositories" by Petr Knoth. The presentation explored semantometrics, a new class of research evaluation metrics, which builds on the premise that full text is needed to assess the value of a publication. (Presentation available here.) 3. A 24x7 presentation on the "Implementation of the RIOXX metadata guidelines in the UK's repositories through a harvesting service", where Matteo Cancellieri and Nancy Pontika described how the RIOXX metadata guidelines are now a new embedded feature in the CORE Repositories Dashboard. (Presentation slides here.) 4. & 5. Two demo presentations during the Developer Track sessions. The first one was on "Mining Open Access Publications in CORE", where Matteo Cancellieri demonstrated the new CORE API and the second was entitled "Oxford vs Cambridge Contest: Collecting Open Research Evaluation Metrics for University Ranking" where Petr Knoth used the traditional Oxford University vs Cambridge University contest to show how to freely gather and compare the research performance of universities. (The code for both demo presentations is on Github.) 6. A poster on the "Integration of the IRUS-UK Statistics in the CORE Repositories Dashboard", by Samuel Pearce and Nancy Pontika, which showed the process of embedding the existing IRUS-UK statistics service to the CORE Repositories Dashboard. We were delighted also that our poster won the best poster award (yay!). We would like to thank all the conference participants who stopped by our poster, got the CORE freebies and voted for us! (You can access the poster here.) Based on the fact that this conference has a clear focus on repository services and that the CORE service uses or is being used by these services, we were also extensively mentioned in other presentations as well. For example: Richard Jones in his presentation on Lantern mentioned that the project is using the CORE API; Paul Walk described how CORE is using the RIOXX metadata application profile; the Repositories of the Future panel, organised by COAR, stressed on the importance of the role of aggregators in the repository environment specifically naming CORE; and the "Ideas Challenge", a thought-provoking and brainstorming group exercise consisting of programmers and repository managers that focused on how to make the lives of academics easier, proposed CORE as a runner up for the development of a cross-repository journal and topic browse interface. Finally, CORE was also presented in the Jisc poster on "Jisc's Open Access Services". As a genuine open access, open data and open science supporter, CORE is also participating at the EU-funded Facilitate Open Science Training for European Research (FOSTER) project, which also presented a poster on the project's main activities, objectives and the e-learning platform. CORE has built the portal and the e-learning platform. The past week has been a good week for the CORE team. We met our old friends, made new ones, received precious feedback from the community for our services, but more importantly we realised that the CORE service is integral to the repositories community. So, stay tuned with us! Related Links: CORE blog post link
In this year's Open Repositories 2016 Conference, an international conference addressed to the scholarly communications community with a focus on repositories, open access, open data and open science, CORE had 6 items accepted; 1 Paper, 1 Repository Rave presentation, 1 Workshop, 1 Poster and 2 showcases in the Developer Track and Ideas Challenge. In our presentations we will explore topics on semantometrics, text and data mining and the integration of the RIOXX metadata and the IRUS-UK statistics in the CORE Dashboard. In the two developer track sessions we will demonstrate how to freely gather and compare the research performance of universities and how open access publications can be mined from the CORE API respectively. Related Links: Here you can find the summaries of our proposals.
At this year's Jisc DigiFest Dr. Petr Knoth was invited to sit on a panel discussing Responsible Research Metrics. This panel was organised in the context of the recently published Metrics Tide report commissioned by HEFCE, which looked into issues surrounding the use of quantitative research metrics in REF. The other two panelists were Prof. Stephen Curry of Imperial College and Prof. Cameron Neylon of Curtin University. In his talk, Petr argued for the need to develop a range of new research metrics that make use of article full-texts. We call these semantometrics. Petr also stressed that we need to move away from performance measures established axiomatically or ad-hoc without demonstrating their ability to capture aspects of research performance on data. These measures include especially the widely used higher-level metrics, such as the h-index. "We need to move towards data driven approaches to the development of research evaluation metrics" he reiterated. Related Links: Presentation link
The ebook "Text Analytics: 28 Experts Share How to Achieve Business Value" (download page) gives insights into how large industries are exploiting big unstructured data to drive business value. The free eBook was created to demonstrate the benefits of text analytics to a vast array of companies, customer intelligence professionals, and marketers. In this ebook Dr. Petr Knoth discusses how text mining of scientific literature can help reveal meaningful connections, which are hard to discover otherwise. From his experience as a Senior Data Scientist in Mendeley and founder of COnnecting REpositories (CORE), a database that aggregates open access scientific papers, he gives three recommendations for how to successfully apply and derive value from text analytics; 1. the need for an evaluation framework with well-defined metrics, 2. the necessity to collect representative ground truth data and 3. realistic and clear communication with the customer. Knoth states that "text mining has so many application domains, it is absolutely incredible".
Dasha and Petr have participated in the challenge which is part of the upcoming Web Search and Data Mining (WSDM) conference. The challenge, coorganised by Microsoft and Elsevier, was to assess the importance of scholarly articles, using data from Microsoft Academic Graph -- a large heterogeneous graph comprised of more than 120 million publications and the related authors, venues, organizations, and fields of study. Dasha and Petr (team called BletchleyPark) were the best out of 32 teams in the training round of the competition and after the validation round were invited to take part in the second phase of the challenge as one of the eight best of the 32 teams. They will also present their method at the workshop in San Francisco, California, in February. Related Links: WSDM Cup Challenge
KMi's project COnnecting REpositories (CORE) was included in the June issue of the Best of Business Web newsletter. According to the editor's, Robert Berkman, comment, CORE ... is a real gold mine of a research site. You can perform precision searches by using the advanced search to quickly search via phrase or Boolean; limit by author, publisher and year; choose to only return articles available in fulltext; and search the entire text or limit to those found in the title and abstract. After the list of initial results are returned, you can further refine the list by publication type, language, journal and other fields. CORE also will suggest similar articles and displays these via a visually impressive interactive graph. While only 10% of the items are available in PDF fulltext, even for those that are not, full bibliographic information and an abstract are provided. Consider this site if you are looking for academic and scholarly papers from around the world, including those in languages other than English. The Best of Business Web is a monthly newsletter addressed to market researchers, information professionals, entrepreneurs and business librarians. The newsletter is run by the New School of Public Engagement, New York City, USA. This demonstrates the international attention that CORE receives and its important role in promoting open access to scholarly scientific results.
The Open University's Charter Day celebrations concluded with the Learn About Fair yesterday. People from all over the OU found out more about a variety KMi projects, including Engage and EDV. It was a particularly good opportunity for the latter to showcase the Democratic Replay tool, as we quickly approach the UK General Election next month. The two day fair was well attended and we exhibited to several VIP guests. Zdenek was pleased to discuss OU Analyse with Chancellor, Martha-Lane Fox, who showed a real interst in the work behind it. Members of MK:Smart were also given the chance to meet Vice Chancellor Peter Horrocks and Mark Lancaster, MP for North East Milton Keynes. There was clear interest in our technologies from different faculties. We were approached about exhibiting our AR demo and CORE work at an OU conference in June. There was lots of interest in Paul Hogan's AR app promoting MK:Smart. Take a look at it in action in the photo gallery.
KMi is proud to announce that there are two new doctors in our midst. In a turn up for the books, and after years of hard work, Hassan Saif and Petr Knoth both passed their vivas today. We celebrated with a bottle of bubbly and speeches from each candidate. Both were very thankful for the support they had received from colleagues and friends at KMi. Petr highlighted that "KMi is a great environment to work in," and Hassan commented "When I first joined KMi I dreamed that this day would come."
Today marked an exciting day for the OU, Baroness Martha Lane-Fox was installed as our new Chancellor in the Milton Keynes degree ceremony at Milton Keynes theatre. As part of the days celebration's, KMi were invited to have a stand at the post ceremony showcase at the Walton Hall campus in Milton Keynes. KMi presented our new pipeline technologies in three broad themes: the future of scholarly knowledge; a future of data; and the future of place. Our 'Future Place' theme is significantly wide-reaching to have had its own stand at this showcase, where it showed the new work of MK:Smart via a future vision of Milton Keynes. Similarly, on the KMi stand we presented some 'Place' themed technology, such as your individual library of texts via interactive eTextBooks; and your own laboratory via 'webcasting live and interactive'. The 'Future Scholar' theme showcased two projects: CORE - which is already 'plugged in' to the OU's Open Research Online service (ORO), and represents a vision of the future of open knowledge exchange, as it reads, integrates and shares the world's open texts; Rexplore - which maps the changing shape of scholarly disciplines via research people and publications. The 'Future Data' theme showcased three projects: OUAnalyse - which is creating a dashboard to help us predict and support student success via behavioural data; OUSocial - which reminds us that 'emotion' is a key part of any analysis, and indicates how we might mine social spaces for learning-related emotion; DiscOU - which presents a range of Apps that leverage Linked Data to discover and connect OU resources to the world. We welcomed to be part of this exceptional occasion for both graduates and the wider OU community!
At the beginning of August, the OU Analytics team in KMi received the following letter from the Office of the Pro Vice-Chancellor (Academic): "Zdenek and team, At the office of PVC-A Team Away Day I asked colleagues to nominate people who they work with and who deserve a special thank you. The team were nominated for "turning barriers into opportunities and ignoring boundaries to produce a weekly model for predictions whether students will progress. Thank you from me and my team for all you do to support us". It is great to acknowledge the work of the Analytics team in breaking barriers and facilitating the work of the OU Student Support Teams University wide. Well done all! Related Links: The OU Analyse project
Two mysterious boxes of quality champagne have been delivered to KMi on Friday last week. After an original uncertainty about the sender and the true recipient, it has been revealed they were sent by a London based company Research Research Limited. The champagne was addressed to Petr Knoth and his team developing the CORE system as an expression of thanks for producing the service. The company has started using the CORE dataset and services to improve the performance of their classification algorithms, which are applied in production as part of their business. The adoption of CORE helped to dramatically boost their performance indicated by a substantial increase in F-measure. The champagne has been sent as special thanks for the CORE outputs and Petr's help in setting this use case up. The event triggers an interesting question of whether the next REF should use champagne as one of the impact indicators.
KMi is to receive resources from Jisc to support 3 full time personnel to continue working on CORE and deliver it as a service. The Jisc decision to continue supporting CORE resulted from a few events. First, the Open Mirror feasibility study commissioned by Jisc last year and published in June 2014 recommended to sustain CORE. In parallel, Jisc asked KMi to create a Service Delivery Plan analysing the costs of sustaining CORE as a service. This Service Delivery Plan served as a basis for consequent negotiations between the OU and Jisc in London in May 2014 about the implication of each service delivery option. The result of this meeting was an agreement on exploring the possibility of delivering CORE as a joint OU and Jisc service starting from the second half of 2015. The meeting also created the basis for the specification of the work to be done from July 2014 to June 2015. This work is financially supported by Jisc and covers 3.2 full-time members of KMi staff. Related Links: Open Mirror Feasibility Study
KMi work received high visibility at the 9th International Conference on Open Repositories (OR2014) in Helsinki, Finland especially due to the KMi's CORE project being mentioned on numerous occasions in the talks of non-OU conference participants. OR 2014 is the main conference in the field of open science and open access repositories and attracted this year over 400 attendees. Additionally, a large number of participants attended virtually as sessions were also broadcasted online. KMi's Petr Knoth representing the CORE and FOSTER projects delivered 1 full paper presentation, 2 posters and was also invited to sit on one panel. However, the highlight of the conference from KMi's perspective, was certainly the fact that CORE was discussed in presentations of non-KMi people, sometime even as a key enabling component. The first day of the conference hosted the Open Access Button workshop. OA Button makes individual moments of injustice and frustration in accessing research outputs visible to the world. The workshop chaired by Penny Andrews of Sheffield University discussed the technical issues the implementation of OA Button faced and highlighted the use of CORE as an important component for discovering open access copies of research articles on the Internet. The second day of the conference featured a presentation by Martin Klein of Los Alamos National Laboratory about the HyberActive project. HyberActive provides a pro-active service to archive web references from scholarly articles. A KMi visiting researcher, Dominika Koroncziova and Petr Knoth helped the team in Los Alamos to integrate HyberActive with CORE and set up a demo for OR 2014. During the presentation, Martin Klein showed how CORE sends notifications about new articles and the references extracted from full-texts to their archiving service and described how this simplifies efforts to research data, code and publications management and preservation. In the afternoon, Petr Knoth gave a full-paper presentation titled "My repository is being aggregated: a blessing or a curse?" authored by Petr in collaboration with Lucas Anastasiou and Samuel Pearce. Petr explained how repositories and aggregators need to create a mutually beneficial ecosystem in which usage statistics are shared, while preserving the distributed and open nature of the overall architecture. The next session was a panel organised by the Jisc Repositories Shared Services Project featuring presentations and discussion from a set of services and projects (RIOXX, V4OA projects, RJB, IRUS-UK, SHERPA Services and CORE), which are seen as critical for the UK research infrastructure. One representative from each of these projects was invited to sit on the panel featuring Jisc, the two Jisc centres of excellence EDINA and MIMAS, University of Nottingham and KMi, the Open University. This was a lively 75 minute session attracting a good number of questions from the audience. The long day continued with a poster on the new FOSTER project presented by Eloy Rodriguez and Petr Knoth and a poster on the Jisc RSSP project mentioning CORE and presented by Jisc with the assistance of the service providers. A few posters, such as the London School of Economics poster also mentioned CORE. The last day of the conference has seen a presentation from Richard Jones of Cottage Labs demonstrating the outcomes of the Open Access Repository Registry project funded by Jisc in in which KMi participates. The presentation showed the exchange of data between the new registry and CORE. Overall, this was a very busy, but rewarding week.
KMi celebrated The Open University's 45th 'Charter Day' today celebrating the historic signing of The Open University Charter with the launch of the new OU Pipeline, a website dedicated to the delivery of Knowledge Media technologies into the OU. Showing off KMi pipeline technologies at Charter Day 2014, we presented a range of new work in three broad themes: the future of scholarly knowledge; a future of data; and the future of place. Our 'Future Place' theme is significantly wide-reaching to have had its own stand at Charter Day, where it showed the new work of MK:Smart via a future vision of Milton Keynes, the home of the main OU campus. Similarly, on the KMi stand we presented some 'Place' themed technology, such as your individual library of texts via interactive eTextBooks; and your own laboratory via 'webcasting live and interactive'. The 'Future Scholar' theme showcased two projects: CORE - which is already 'plugged in' to the OU's Open Research Online service (ORO), and represents a vision of the future of open knowledge exchange, as it reads, integrates and shares the world's open texts; Rexplore - which maps the changing shape of scholarly disciplines via research people and publications. The 'Future Data' theme showcased three projects: OUAnalyse - which is creating a dashboard to help us predict and support student success via behavioural data; OUSocial - which reminds us that 'emotion' is a key part of any analysis, and indicates how we might mine social spaces for learning-related emotion; DiscOU - which presents a range of Apps that leverage Linked Data to discover and connect OU resources to the world. All in all, some exciting KMi innovations to celebrate 45 years of Open University innovation! Related Links: The KMi Pipeline - OU sign in required, sorry!
A new policy stating that research outputs submitted to post-2014 REF should be Open Access has been announced on Monday, March 31st accompanied with a circular letter to all UK Vice-Chancellors and Principals (see the link below). The policy requires all journal and conference publications with a UK HEI author to be deposited in an institutional or subject repository on acceptance for publication. Publications not compliant with this requirement, including those that are made Open Access only retrospectively, will not be eligible for submission in the next REF exercise. The policy is a fantastic news for all those who support unrestricted access to knowledge for all. It will enable millions of people who have been consistently denied access to research outputs, such as high school students, small and medium enterprises, government and the general public, to get access to the results of research funded by the taxpayer. The policy has been the result of a rigorous consultation process. Input has been also requested from KMi's Petr Knoth and Zdenek Zdrahal. Some of our previous recommendations, particularly those about unrestricted machine access, have been considered by HEFCE as they have been interested in the possibility of using the CORE system, developed in KMi, for monitoring the policy compliance. The last request for feedback has been sent to Petr Knoth and Zdenek Zdrahal a week before the policy announcement together with an early version of the policy. While the policy constitutes certainly a dramatic and positive change, there are still some aspects we hope will be tweaked at a later stage. Many of them relate to the ability to re-use and text-mine research publications at a global scale. The link to our full response is available at the bottom of this page. Related Links: KMi's response to the HEFCE policy The HEFCE Open Access policy document
A study commissioned by the Knowledge Exchange (see the link below), a Danish Agency for Culture supported by a a number of international funders, worked to identify Open Access services across the world that are key to the future of scholarly communication. The aim was to analyse the financial challenges these services face and create a shared strategy that would guarantee their sustainability. The services discussed in the study are used by millions of academics on a daily basis and are already an essential part of the research ecosystem. They include arXiv.org, EPrints, the Public Library of Science (PLoS), the Public Knowledge Project (PKP) and the Directory of Open Access Journals (DOAJ). The CORE system developed in KMi is among these service. When the study was published, the Knowledge Exchange funders organised a workshop in Utrecht, The Netherlands inviting a representative for each of the services. The KMi's Petr Knoth attended the meeting representing CORE. The meeting showed that the key services forming the Open Access ecosystem (Neil Jacob's of Jisc presented the overview of them - see photo) are often in very different situations. Some of them, such as the BASE system, receive a significant financial contribution from their own institution in the spirit of supporting Open Access, while other institutions see these services rather as an opportunity to generate profit to improve their own budget. These institutions do not consider their share of responsibility for the future of scholarly communication or they do not embrace the ideas of openness of research outputs and education for all. Consequently, some of these services, such as DOAJ, already left the academic environment as they can be more efficiently provided through a not-for-profit company. While there was a whole range of issues discussed, the most critical included: - The continuous need of funding: It is very difficult or even impossible for many of these services to charge the end-user a fee as this goes directly against their mission. - The institutional greed: Universities are often not willing to lower the overheads for these services. Libraries are typically not willing to contribute even a small percentage (in the range of 1%) of their commercial services and articles subscription budgets (Elsevier, EBSCO, SciVal, etc) to Open Access services. - Supporting a global service at the local level: Universities and libraries are typically not willing to financially contribute to a service which benefits the whole world, not just them. One of the outcomes of the meeting was that a package of these Open Access critical services should be created and certain funders that distribute money to universities and libraries, such as HEFCE, should mandate a financial contribution to the sustainability of Open Access services. This strategy is now being explored by the Knowledge Exchange. Related Links: Sustainability of Open Access Services Report Phase 1 and 2: Scoping the challenge and consulting the stakeholders
CORE participated in the annual Learn About Fair at the OU, that took place on 26 February 2014. The CORE team had the chance to present CORE vision to a wide range of visitors including PhD students, associate lecturers, developers and other academic staff. During the fair CORE had the opportunity to demonstrate CORE applications, including the CORE portal search and the mobile application, showcase the core plugin, explain other aspects of the service, such as the CORE API and the repository analytics. Several tutors expressed their interest in using CORE as a research search engine, while a few developers explored the opportunities to make use of CORE API on top of their applications.
In June 2013, Jisc invited in a closed tender selected teams to bid for the UK National Open Access aggregator. The wide set of requirements included high coverage of UK institutional repositories, the ability to harvest and process data from different repository systems and the availability of a single harmonized API to data stored across UK repositories. The key factors for judging the proposed bids were availability and maturity of existing solutions, satisfying the technical criteria of the tender and the timescale and cost required to meet additional tender criteria. The CORE team, based at KMi, elaborated the proposal based on the existing CORE solutions. The key components had been already implemented and were available from the CORE system. Most of the required services had been in principle included in the currently running DiggiCORE project. Their adoption to meet the tender specification were relatively straightforward. The KMi bid for the UK Open Access Aggregator has been submitted before the 2nd July deadline. In July, Jisc announced that the KMi solution won the tender and CORE will be the UK national Open Access aggregator. KMi has been then asked to prepare and negotiate a concrete project plan with Jisc that on top of the tender asks CORE to network with a number of key stakeholders, in particular, Google Scholar and OpenAIRE. In December 12th, Jisc issued the Grant Letter for UK Aggregation, though the project has started already in October. UK Aggregation is part of the Jisc Repositories Shared Services project. As the UK Open Access aggregator of institutional repositories, CORE provides new opportunities for services built on top of the aggregated content. Apart from supporting text-mining, developers and discoverability of content, CORE also offers opportunities for analysis and monitoring. For example, HEFCE announced that research papers must be immediately after their publication available trough an institutional repository in order to be eligible for post-2014 REF submission. If there is an agreed embargo period for open access availability, the rule takes it into account. CORE as the UK Open Access aggregator can provide all necessary information needed to confirm the compliance of publications with REF rules. The CORE team has developed a pilot application that allows the user to monitor the REF compliance. Petr Knoth and Zdenek Zdrahal were invited to the Workshop on repositories held in the HEFCE office in London on November 22th, which was chaired by Dr Steven Hill, Head of Policy at HEFCE. Petr & Zdenek presented the CORE Compliance Analytics application to the other participants of the workshop. The application can support a range of stakeholders, including researchers, repository managers and HEFCE, in monitoring compliance with respect to the HEFCE Open Access post-2014 REF policy. At present, CORE content consists of 18M+ records with 1.8+ full text, machine readable documents from 612 institutional repositories worldwide. This includes all compliant UK institutional repositories. CORE services are used by OU's ORO and a number of other institutions, including the European Library and UNESCO. In November 2013, CORE content was accessed by 150k+ unique visitors. Related Links: CORE
On December 12th, Prof. Beth Plale, co-director and chair of the HathiTrust Research Center (HTRC) and Indiana University visited KMi. HTRC is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library, to help meet the technical challenges of dealing with massive amounts of digital text that researchers face by developing cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge. During her short visit to OU, Prof. Plale gave presentation to the KMi staff and the guests from the OU Library and other OU departments about the organization and the current research activities of HTRC. Before and after her presentation, she discussed the challenges of document aggregation and text mining with Petr Knoth and Zdenek Zdrahal of KMi. Prof. Plale leads the US team in the joint proposal DiscoveryCORE submitted to the third Digging into Data Call. The partners in the DiscoveryCORE bid are Open University - KMi (UK), HTRC, University of Indiana and University of Illinois (US) and the European Library (NL). Related Links: Hathi Trust Research Center
On December 3rd 2013, the members' meeting of The United Kingdom Council of Research Repositories (UKCoRR), which is recognised as the main network of professionals supporting the uptake of Open Access in the UK, was hosted by the Knowledge Media Institute. The meeting was visited by about 50 delegates from a wide range of organisations including universities, libraries, not-for-profits (EuroCRIS) and funders (Jisc, HEFCE, EPSRC). The event has been also virtually visited by many, thanks to the meeting being streamed online. The meeting was opened by Prof Peter Scott, who explained that the mission of Open University to deliver more open education and research is well-aligned with the goal of UKCoRR. Peter also discussed the role of KMi within the OU and a number of achievements of KMi in the Open Access to educational resources area. The rest of the meeting was moderated by the UKCoRR Chair, Yvonne Budden of the University of Warwick. In the morning session Yvonne Budden, informed the participants about important UKCoRR activities. In the following presentation Ben Johnson of HEFCE explained the position of HEFCE to Open Access, which currently seems like a game changer. HEFCE requires all research outputs to be submitted to post-2014 REF to be made Open Access in order to be eligible. Five "Lightning talks" then introduced important challenges of Open Access publishing. In the afternoon session Petr Knoth of KMi presented the current state of development of the CORE system, which aggregates OA content from repositories. Zdenek Zdrahal (KMi) then showed how CORE could be used for monitoring OA compliance for post-2014 REF and Loucas Anastasiou (KMi) discussed the issues in OAI-PMH harvesting and how can we overcome them. Chris Biggs (OU Library) demonstrated innovative repository benchmarking in Open Research Online (ORO). In the last presentation, Nicky Whitsed, Director of Library Services, OU summarised the important issues of open access publishing and institutional repositories. Related Links: The United Kingdom Council of Research Repositories UKCoRR CORE
Using search engines effectively is now a key skill for researchers, but could more be done to equip young researchers with the tools they need? Here, Dr Neil Jacobs and Rachel Bruce from JISC's digital infrastructure team shared their top ten resources for researchers from across the web. CORE was placed among the top 10 search engines that go beyond Google. Related Links: The top ten search engines for researchers that go beyond Google
Today Thursday 23rd May, His Royal Highness Andrew, The Duke of York visited the Open University. His Royal Highness was met by Martin Bean, the Open University Vice-Chancellor and Sir Henry Aubrey-Fletcher, Her Majesty's Lord-Lieutenant for Buckinghamshire. The Duke met with a range of eminent guests and friends of the Open University an unveiled a plaque in the JLB Nexus area to commemorate his visit. During the tour of Open Unversity innovation highlights, the Duke met with KMi Director, Professor Peter Scott, and KMi research student Drahomira Herrmannova. Peter introduced the Duke to our interactive book research and work in iTunes U and provided a perspective on 'post personal computing', and Dasha discussed 'Big Data' and learning analytics research. Other University highlights included a discussion of our new FutureLearn venture, our new Open Educational Resource work in OpenLearn, and the new LTS App 'OU Anywhere'. Related Links: The Duke of York KMI Interactive Books Project
CORE has been placed among the Top 100 Thesis & Dissertation References on the Web by OnlinePhDProgram.org. The list has been published yesterday. Online Ph.D. Program.org is dedicated to helping future doctoral candidates find the right program that meets their needs, desires, and goals. The site offers helpful blog posts, articles, and a wealth of other information that can answer questions about online Ph.D. programs. Related Links: The Top 100 Thesis Dissertation References on the Web list
A cloudless sky in the Hague, Netherlands saw on the 4th and 5th March the Europeana Cloud kick-off. The event was visited by about 70 delegates from the partner institutions and also by the chief of the responsible European Commission unit. One of the important tasks of the kick-off was to further discuss the infrastructure requirements that will be used to select and shape the type of the Cloud to be developed. This initial meeting on 4-5 March marked the official start of three years of collaboration between 35 partners. It is a diverse group, including representatives of libraries, research infrastructures, developers, publishers and researchers. They come from many different backgrounds but nevertheless share a common goal of establishing a cloud-based system for Europeana and its aggregators. Europeana Cloud is a €4 million Best Practice Network coordinated by the Europeana Foundation, designed to establish a cloud-based system for Europeana and its aggregators. In Europeana Cloud will be new content, new metadata, a new distributed storage system, new tools and services for researchers and a new platform - Europeana Research. Content providers and aggregators, across the European information landscape, urgently need a cheaper, more sustainable technical infrastructure that is capable of storing both metadata and content. Researchers require a digital space where they can undertake innovative exploration and analysis of Europe's digitised content. Europeana needs to get closer to the target of 30 million items by 2015. KMi is the partner with the second highest number of person month (after Europeana Foundation) out of 33 partners. KMi was invited to the project based on our experience in content aggregation and text-mining acquired in the CORE family of projects. Apart from developing the Cloud specification, reviewing existing Cloud technologies and assessing their suitability for Europeana, KMi will also be responsible for experimenting with different models for identifying semantically related content from a database of around 30 million objects. This technology will be then provided as a service of the Cloud. Related Links: Europeana Cloud Kicks Off Under Clear Skies
The KMI team consisting of Petr Knoth, Drahomira Herrmannova and Zdenek Zdrahal achieves in the NTCIR-10 CrossLink evaluation competition according to the organisers overall best results in the English to Chinese, Japanese and Korean (English to CJK) task and is the top (steadily among the three best and mostly second best) performer in the CJK to English task. Ten international teams took part in the evaluation. This is the second time team KMi participated in this competition. NTCIR is a major forum (similar to TREC) of evaluation workshops designed to enhance research in Information Access (IA) technologies including information retrieval, question answering, text summarization, extraction, etc. The NTCIR-10 conference will take place as usually in Tokyo, Japan this June. The CrossLink task (Cross-Lingual Link Discovery - CLLD) is a way of automatically finding potential links between documents in different languages. It is not directly related to traditional cross-lingual information retrieval (CLIR) because CLIR can be viewed as a process of creating a virtual link between the provided cross-lingual query and the retrieved documents; but CLLD actively recommends a set of meaningful anchors in the source document and uses them as queries with the contextual information from the text to establish links with documents in other languages. Wikipedia is an online multilingual encyclopaedia that contains a very large number of articles covering most written languages and so it includes extensive hypertext links between documents of same language for easy reading and referencing. However, the pages in different languages are rarely linked except for the cross-lingual link between pages about the same subject. This could pose serious difficulties to users who try to seek information or knowledge from different lingual sources. Therefore, cross-lingual link discovery tries to break the language barrier in knowledge sharing. With CLLD users are able to discover documents in languages which they either are familiar with, or which have a richer set of documents than in their language of choice. Related Links: NTCIR-10
The first multi-stakeholder review of the achievements of the World Summit on the Information Society (WSIS+10) entitled "Towards Knowledge Societies for Peace and Sustainable Development" was hosted by UNESCO in Paris from 25 to 27 February 2013. The event attracted about 1000 participants from around the world; about 1500 additional people participated on-line. Zdenek Zdrahal was invited to participate in the high-level roundtable entitled "24. Using E-Science to Strengthen the Interface between Science, Policy and Society". The panel was chaired by the Assistant Director-General for Natural Sciences, Ms Gretchen Kalonji and included seven members (governmental ministers, ambassadors, Head of Digital Science Unit of the European Commission and the Ex-Chief Scientific Advisor for UK Government department). The aim of the roundtable was to explore the opportunities and challenges of using e-Science to support decision making in science policy, to look at the technical requirements for designing a web-based platform to support decision making in science policy and to share experiences gained from developing similar platforms. Zdenek Zdrahal presented the CORE system for aggregating, semantic enrichment and accessing open access scientific papers. The potential of CORE for supporting novel approaches to E-Science was explained. As an example, the conference portal "UNESCO Repository for Connecting Local and International Content (CLIC)" developed for UNESCO by the CORE team at KMi was presented. Documents submitted to the UNESCO conferences through CLIC are semantically enriched and linked to the most similar scientific papers aggregated by CORE from the world open access repositories. Since one of the WSIS+10 hot topics discussed by many participants from governments, industry and academia was "broadband learning", the possibility of using CORE services for projects like Futurelearn were also outlined. The presentation continued by a number of informal meetings with the WSIS+10 participants where the future directions of CORE development and the possibilities of using CORE services were discussed. Related Links: schedule of events, agenda, participants CLIC portal WSIS10 presentation
The eCloud (Europeana Cloud: Unlocking Europe's Research via The Cloud) project is about to start on the 1st of February 2013. Europeana Cloud is a €4 million project coordinated by the Europeana Foundation, designed to establish a cloud-based system for Europeana and its aggregators. In Europeana Cloud will be new content, new metadata, a new distributed storage system, new tools and services for researchers and a new platform - Europeana Research. Content providers and aggregators, across the European information landscape, urgently need a cheaper, more sustainable technical infrastructure that is capable of storing both metadata and content. Researchers require a digital space where they can undertake innovative exploration and analysis of Europe's digitised content. Europeana needs to get closer to the target of 30 million items by 2015. KMi is the partner with the second highest number of person month (after Europeana Foundation) out of 33 partners. KMi was invited to the project based on our experience in content aggregation and text-mining acquired in the CORE family of projects. Apart from building the eCloud infrastructure, KMi will also be responsible for experimenting with different models for identifying semantically related content from a database of around 30 million objects. This technology will be then provided as a Cloud service. Related Links: Europeana Cloud CORE
KMi together with University of Nottingham (CRC) and CottageLabs have been awarded a grant in the JISC Digital Infrastructure Programme to build a UK Open Access Repository Registry. KMi was invited to participate in this closed call directly by JISC based on our work in the CORE project. It has already been decided that the resulting software will become an essential component of UK RepositoryNet+ who will guarantee its long-term sustainability. RepNetRegistry will build on our experience supporting OpenDOAR to provide an advanced, data-driven infrastructure maximising the potential for use with 3rd party services, such as aggregators, cross-search tools or multiple-deposit interfaces, by exposing authoritative quality coltrolled data through a RESTful API. In the context of this project, the CORE team will be responsible for collecting and providing repository statistics from across all UK repositories and providing them as authoritative repository benchmarks to the developed Open Access Repository Registry.
The Open University has widened access to academic research material – available through its Open Access search facility CORE– thanks to technical leaps in this innovative system created by the OU's Knowledge Media Institute (KMi). CORE – which stands for Connecting Repositories - has seen unprecedented success in the past year and has more than tripled in size, now offering content from a global network of repositories, freely available to scholars worldwide. CORE – COnnecting REpositories – provides a large easy-to-search database to help academics, researchers and students to find, explore and download research papers. When the service was first launched in 2011 CORE could source material in 60 repositories – today it aggregates data from over 230 internationally plus content from thousands of Open Access journals acquired through the Directory of Open Access Journals (DOAJ). This means the service holds more than nine million metadata items and about half a million full text files. Funding from JISC is permitting the project to develop further analytical processes with DiggiCORE project, which will utilise social media tools. Unlike other Open Access scholarly search systems, CORE also aggregates the full-text files, and not only metadata, and therefore ensures the publication full-texts are freely available for download. Users of commercial academic search systems, such as Google Scholar, can be denied access to the full article, particularly when subscription fees are required. This is often frustrating for scholars. CORE specialises in searches of the full-text items held in approved Open Access repositories, ensuring a vastly improved level of accessibility for users. Anyone searching for full texts on CORE will therefore be able to download all content they discover. CORE offers a unique application interface (API) that makes it possible for others to easily build applications utilising the Open Access content. The CORE API has a lot potential. "For example, it allowed us to build an application that enables people to search for Open Access content from mobile devices or to develop a content recommendation plug in for libraries," says Peter Knoth, the software designer and founder of the CORE system. The reason for CORE's success rise is clear, says Peter: "A huge amount of research papers has been available online as Open Access, but there was limited technical infrastructure that would support different kinds of users in exploiting it. CORE is not only a search system, it is a free platform for developing applications that need access to the full-text of research articles. A very large amount of data is now available through the CORE API. The CORE Linked Open Data repository has this month already grown to 100 million RDF triples making it by far the largest Linked Open Data repository at the Open University. "CORE has created a resource which offers some intriguing possibilities. The API to the aggregation puts this valuable information into the hands of researchers and developers and offers them the chance to use it in new and better ways." says Andy McGregor, the JISC manager of the Resource Discovery programme. "The strength of CORE is that it can be applied in multiple scenarios. In addition to searching for scientific publications, we expect the CORE infrastructure to be used for analytical and research purposes, " says Zdenek Zdrahal, the director of the CORE project. "The CORE platform has become a basis for the development of new services and motivates further research," says Zdenek. In the currently running JISC funded DiggiCORE project, which is a collaboration of the Open University and the European Library, the CORE system is used as a platform for analysing networks of research publications to help better understand the properties of high impact publications and influential authors. But components of the CORE system are also likely to find its use in future projects. CORE is now available for flexible use online and on mobile devices and tablets and is already benefiting journals, scholars, at conferences and as technical support answering the demand for Open Access to academic research papers. Related Links: The CORE project page
How do you mark the start of the largest experiment to test a nation-wide mobilization of mobile learning in higher education anywhere in the world? On September 25th 2012 His Excellency Sheikh Nahayan Mabarak Al Nahayan, UAE Minister for Higher Education & Scientific Research inaugurated the First Annual Global Mobile Learning Congress 2012 at the United Arab Emirates University in Al Ain. This congress marked the achievement of the Federal Mobile Learning Initiative (initiated only in April 2012), and has set the three Federal higher education institutions of UAE to introduce iPad-based teaching and learning for entering Foundation Program students this September - starting an exciting programme to explore post personal-computer learning at scale. Something like 14,000 students will use Apple's new iPad to "learn different". KMi Director, Peter Scott presented the Open University's perspective on learning in post-personal computer world at the congress as part of a series of international guest presentations helping both celebrate the launch, and help the UAE team to think carefully about how they will be tracking and evaluating its impact as the Foundation students progress in their work. Other Congress speakers included Dr. Ruben Puentedura, the Founder and President of US-based consulting firm Hippasus, which focuses on transformative applications of information technologies to education; Dr. James Ashby, President and Chief of Psychometrics, CORE Edutech, USA, and who is a leading innovator in research-based education designs for elementary, secondary, and higher education; and Apple Distinguished Educator David Baugh recounting the great success of the School in a Box initiative which allows remote schools, villages and towns in places such as Nepal. Federal Mobile Learning Initiative Chairman Dr. Tayeb Kamali noted that this event was just the start of an important an ongoing collaborative project from the three Federal higher education institutions to help boost students' learning outcomes. It is a very exciting time to be learner in the UAE! And yet more exciting to be a teacher in this new world. Related Links: The HCT News story on the iPad Launch
Drahomira Herrmannova was awarded the Prize of Zdena Rabova by the Brno University of Technology. This prize is awarded annually by the dean of the faculty to two students for excellent study and science results. The nomination was supported by Drahomira's diploma thesis, which she wrote during her period at KMi and which was based on paper by Drahomira and Petr Knoth, presented at JCDL conference 2012.
The 7th International Conference on Open Repositories (OR 2012) has seen last week close to 500 participants, the highest number in its history. The theme and title of OR 2012 in Edinburgh - Open Services for Open Content: Local In for Global Out - reflects the current move towards open content, 'augmented content', distributed systems and data delivery infrastructures. A very good fit with what CORE (core.kmi.open.ac.uk) offers. The CORE system developed in KMi had a very active presence. Petr Knoth has presented different aspects of the CORE system in a presentation, at a poster session (with Owen Stephens) and also during the developers challenge. CORE has been also discussed in a number of presentations by other participants not directly linked to the Open University. Perhaps the most important case being the UK RepositoryNet+ project presentation. UK RepositoryNet+ is a socio-technical infrastructure funded by JISC supporting deposit, curation & exposure of Open Access research literature. UK RepositoryNet+ aims to provide a stable socio-technical infrastructure at the network-level to maximize value to UK HE of that investment by supporting a mix of distributed and centrally delivered service components within pro-active management, operation, support and outcome. While this infrastructure will be designed to meet the needs of UK research, it is set and must operate effectively within a global context. UK RepositoryNet+ considers the CORE system as an important component in this infrastructure. The similarity of the CORE approach with that of William Wallace, a Scottish hero in the picture, is the determination to fight for freedom. In this case, freedom of access to content. There is, hopefully, also one difference. We wish CORE will not end end up in the same way as William Wallace ... We will see -:) Related Links: OR2012 William Wallace
KMI and the European Library/Europeana jointly organised the 1st International Workshop on Mining Scientific Publications associated with JCDL 2012 - the most prestigious conference in the world of digital libraries. The workshop was attended by major players in the field including the National Library of Medicine, Library of Congress, CiteSeerX, Elsevier and British Library. Although Barack in the end didn't come, the workshop was very successful, the only problem being the lack of chairs in the room. We (the workshop organisers - Petr Knoth, KMi; Zdenek Zdrahal, KMI and Andreas Juffinger, The European Library/Europeana) were motivated by the positive response of the community to the importance of issues researchers face when mining research publications to improve the way research is carried out and evaluated. A paper authored by Drahomira (aka Dasha) Herrmannova and Petr Knoth (both KMI) entitled 'Visual search for supporting content exploration in large document collections' presented by Dasha during the workshop received encouraging feedback. Another KMI talk was given by Petr who discussed the issues in current digital library aggregation systems, especially those focusing on Open Access, and explained the advantages offered by the CORE system in a presentation titled "COnnecting REpositories (CORE): Aggregating and Enriching Content to Support Open Access." All papers presented at the workshop are available on the workshop web page below. Related Links: 1st International Workshop on Mining Scientific Publications
Is an article published by the University of London Computing Centre featuring certain aspects of the CORE system. Check it out ... Related Links: Cor! It's time for CORE!
KMi was contracted by JISC to create EDUKApp, an educational, UK-wide app and widget store. EDUKApp will be both a repository and community site that focuses on collecting and promoting widgets and apps for learning and teaching. The first prototype of EDUKApp was presented at the JISC CETIS, that took place in Nottingham on Feb 22-23 under the motto: "The Future Just Happened? Technology Innovation in Universities and Colleges". KMi researchers Fridolin Wild, Lucas Anastasiou, and Alexander Mikroyannidis used the half-day workshop to introduce to attending academics, governmental-level education officials, researchers, and software evangelists into personal learning environments, widgets, and the new store - with positive reviews and feedback.
DiggiCORE is a new two year project funded under the Digging into Data programme, which supports collaboration between the UK, USA, Canada and the Netherlands. The DiggiCORE partnership consists of KMI and The European Library. This makes DiggiCORE the only funded fully European project in the whole programme. The members of the DiggiCORE Advisory Board represent The Open University, SPARC Europe and the Europeana Foundation. The objective of DiggiCORE is to analyse a vast set of research publications from the Open Access domain using natural language processing and social network analysis methods to identify patterns in the behaviour of research communities, to recognise trends in research disciplines, to learn new insights about the citation behaviours of researchers and to discover features that distinguish papers with high impact. The results of this analysis should enable the development of better methods for exploratory search and browsing in digital collections and should encourage new ways of evaluating research or the researcher's impact beyond standard citation measures. To enable the analysis, the DiggiCORE project will extend and improve the CORE system providing access to well-structured and organised information acquired by harvesting, cleaning, integrating and processing information from a very large and fast-growing collection of research publications distributed across more than 1,800 Open Access repositories and many Open Access journals. The Open University's Open Research Online (ORO) is among the harvested institutional repositories. of the The DiggiCORE infrastructure will be freely accessible to the public through a set of web services. Related Links: Eight international research funders announce winners of 2011 Digging into Data challenge DiggiCORE project plan Winners of the Digging into Data programme
ServiceCORE is a follow up project of CORE funded by JISC. The project aims at developing a nation-wide service for searching, navigating and accessing Open Access publications stored across 143 British institutional repositories. The CORE system is unique in its way to use text-mining and linked data to connect and interlink semantically similar publications at the level of full-texts. Within ServiceCORE this functionality will be extended also to metadata records. The fact that KMI has been funded to extend the current CORE system with new functionalities and to establish it as a British service is a great challenge as well as an acknowledgment of the CORE success and wide impact. Please read this news story to see what JISC says about CORE: http://www.jisc.ac.uk/news/stories/2011/09/openaccess.aspx. The ServiceCORE project benefits from a very strong Advisory Board represented by members of OpenDOAR, UKOLN, MIMAS and The European Library. Related Links: CORE on JISC website CORE portal CORE video
The KMi submission authored by Petr Knoth, Vojtech Robotka and Zdenek Zdrahal entitled: " Connecting Repositories in the Open Access Domain using Text Mining and Semantic Data" won the Best Poster/Demo Award at the International Conference on Theory and Practise of Digital Libraries (TPDL 2011) which is this week taking place in Berlin, Germany. The European Conference on Research and Advanced Technology for Digital Libraries (ECDL) has been the leading European scientific forum on digital libraries for 14 years. For the 15th year the conference was renamed into: International Conference on Theory and Practice of Digital Libraries (TPDL). Related Links: CORE
The KMI team consisting of Petr Knoth, Lukas Zilka and Zdenek Zdrahal scored first in the NTCIR CrossLink competition in the manual assessment category in A2F P@5. The team placed consistently in the top three in other categories. Twelve international teams took part in the evaluation. NTCIR is a major forum (similar to TREC) of evaluation workshops designed to enhance research in Information Access (IA) technologies including information retrieval, question answering, text summarization, extraction, etc. The NTCIR-9 will take place as usually in Tokyo, Japan this December. The CrossLink task (Cross-Lingual Link Discovery - CLLD) is a way of automatically finding potential links between documents in different languages. It is not directly related to traditional cross-lingual information retrieval (CLIR) because CLIR can be viewed as a process of creating a virtual link between the provided cross-lingual query and the retrieved documents; but CLLD actively recommends a set of meaningful anchors in the source document and uses them as queries with the contextual information from the text to establish links with documents in other languages. Wikipedia is an online multilingual encyclopaedia that contains a very large number of articles covering most written languages and so it includes extensive hypertext links between documents of same language for easy reading and referencing. However, the pages in different languages are rarely linked except for the cross-lingual link between pages about the same subject. This could pose serious difficulties to users who try to seek information or knowledge from different lingual sources. Therefore, cross-lingual link discovery tries to break the language barrier in knowledge sharing. With CLLD users are able to discover documents in languages which they either are familiar with, or which have a richer set of documents than in their language of choice.
Linked Data, the sharing of data on the Web at large scale through the use of URIs, the HTTP protocol and RDF is gaining uptake at an ever increasing rate. As, commonly known, this technology is now supported by a number of large players including the UK Government, the BBC, Google, Yahoo and to some extent Facebook. In addition to the relative simplicity of the principles and technologies underlying Linked Data a second major reason for its growing popularity is that it is supported by a number of high quality industrial-strength tools. Following, from this we now find ourselves in a situation where a number of KMi related projects are deploying Linked Data here at the OU. This first summit provided an opportunity for these projects to outline what has been achieved thus far and for KMi in general to discuss plans and priorities for future development and deployment. Specifically, the event covered the following projects: • LUCERO - the project which setup data.open.ac.uk the first Linked Data site in the UK (and probably the world) for a higher education establishment. • RADAR - which supports the analysis and management of OU research data thus aiding with the OU REF submission. • CORE - connecting together disparate scientific repositories enabling them to be searched as a single resource. • Annomation and SugarTube - which enable, respectively, the annotation and semantic search over BBC archives. These tools aim to support OU course teams who wish to find relevant video segments related to a specific OU course topic. • UCIAD - which uses Linked Data to support the analysis of user activity across OU systems. Overall the event was very successful, showcasing a number of innovations and also how useful Linked Data already is to the OU business. One of the main action points of the meeting is that KMi will support the creation of a new OU-wide Linked Data portal which will: act as a central repository for all relevant resources; document relevant activities; and act as an OU Linked Data showcase. Related Links: The LUCERO presentation The LUCERO website The RADAR presentation The RADAR website internal OU access only The CORE presentation The CORE website The UCIAD presentation The UCIAD website
On Wednesday 23rd March 2011, the Eurogene project lead by Dr Zdenek Zdrahal was featured in the printed version of The Times in the article entitled "Gene genie's treasure trove." by Mark Frary. The article discusses, in an interview with Zdenek and Petr, the results and the mission of Eurogene to provide free multimedia learning resources in ten languages for statistical, medical and molecular genetics and to deliver them to students and professionals using KMI technology.
The COnnecting REpositories (CORE) project has been officially started today by a kick-off meeting in the presence of representatives from JISC, OpenDOAR, UKOLN, OU Library and KMi. The CORE project aims to facilitate the access and navigation across relevant scientific papers stored in Open Access repositories. The project will create a new open metadata repository available in the Linked Data format describing the semantic relatedness between research articles stored across a selection of UK repositories, including the Open University Open Research Online (ORO). This will be achieved by harvesting and processing full-text content using NLP techniques for automatic link discovery. The CORE project will also develop a web-service and a demonstrator client which will allow UK repositories to easily navigate their users to relevant full-text Open Access content stored elsewhere. The usability of this service will be demonstrated on the ORO repository by automatically recommending links to related content in other repositories. CORE will also focus on the development of good practice for the service reuse and uptake in collaboration with UKOLN and OpenDOAR. Related Links: CORE project website
KMi members were very much in evidence at the 17th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2010), which took place in Lisbon on 11-15 October. First held in 1987, EKAW represents the main European forum for research in knowledge technologies. In particular, two prestigious awards were brought home by KMi members. The first one, the prize for best student paper, was won by Fouad Zablith for his paper on "Using Ontological Contexts to Assess the Relevance of Statements in Ontology Evolution", which was written in collaboration with Mathieu D'Aquin, Marta Sabou, and Enrico Motta. Another KMi member, Miriam Fernandez, won the award for Best Poster for her work on "Predicting the quality of semantic relations by applying Machine Learning classifiers", in collaboration with Marta Sabou, Petr Knoth, and Enrico Motta. The influential role of KMi in this research community was also confirmed by the three keynotes given by KMi members in both the main conference and associated workshops. Enrico Motta gave a keynote at the main conference on "Realizing Smart Products" and was also invited speaker at the workshop on Context, Information and Ontologies, while Mathieu d'Aquin gave a keynote at the Personal Semantic Data workshop. Finally papers/posters/demos were also presented by Ning Li, Vanessa Lopez and Stefan Dietze. In sum, EKAW 2010 turned out to be yet another exciting and high profile event which confirmed KMi's international status at the forefront of research and development in knowledge technologies. Related Links: EKAW 2010
GET IN TOUCH
Knowledge Media Institute
The Open University