http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/api.php?action=feedcontributions&user=Skyts0401&feedformat=atom
Crop Genomics Lab. - User contributions [en]
2024-03-29T09:56:01Z
User contributions
MediaWiki 1.21.3
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Working)_Pokemon_Go_Pokecoins_Generator_2023_No_Verification
(Working) Pokemon Go Pokecoins Generator 2023 No Verification
2023-02-22T20:43:05Z
<p>Skyts0401: Created page with "Welcome to the ultimate guide to Pokemon Go Pokecoins Generator, Pokemon Go Pokecoins Cheats, Pokemon Go Pokecoins Hack, Pokemon Go Generator 2023, Pokemon Go Generator No Ver..."</p>
<hr />
<div>Welcome to the ultimate guide to Pokemon Go Pokecoins Generator, Pokemon Go Pokecoins Cheats, Pokemon Go Pokecoins Hack, Pokemon Go Generator 2023, Pokemon Go Generator No Verification, Pokemon Go Generator Ios, Pokemon Go Generator Android, Pokemon Go Codes, Pokemon Go Resource Generator and Pokemon Go Pokecoins for Free!<br />
<br />
Are you hoping to find the best and most up-to-date Pokemon Go Pokecoins Generator, Pokemon Go Pokecoins Cheats, Pokemon Go Pokecoins Hack, Pokemon Go Generator 2023, Pokemon Go Generator No Verification, Pokemon Go Generator Ios, Pokemon Go Generator Android, Pokemon Go Codes, Pokemon Go Resource Generator and Pokemon Go Pokecoins for Free? Look no further – we’ve got the best guide for you!<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/8eeaaba<br />
<br />
<br />
First off, let’s define what a Pokemon Go Pokecoins Generator is and how it works. A Pokemon Go Pokecoins Generator is a tool that helps you generate free Pokecoins and resources for the popular mobile game Pokemon Go. By using this tool, you can get unlimited Pokecoins, resources, and even rare items in Pokemon Go.<br />
<br />
Now that we’ve defined what a Pokemon Go Pokecoins Generator is, let’s talk about the different types of Pokecoins Generators available in the market. There are two types of Generators that you can use - the Online Pokecoins Generator and the Offline Pokecoins Generator.<br />
<br />
The Online Pokecoins Generator is the most popular type of Generator. It requires you to enter your details, such as the game name, your username and password, and other information. After that, the Generator will generate the Pokecoins for you. The great thing about this type of Generator is that you don’t need to download anything – the Generator works online and you can use it anywhere.<br />
<br />
On the other hand, the Offline Pokecoins Generator is a bit more complicated. This Generator requires you to download the Generator software to your computer. After downloading the software, you will have to enter your details, such as the game name, your username and password, and other information. After that, the Generator will generate the Pokecoins for you. However, the Offline Pokecoins Generator is only available for Windows computers.<br />
<br />
Now that you know about the two types of Generators, let’s go over the different types of Pokemon Go Pokecoins Cheats, Pokemon Go Pokecoins Hack, Pokemon Go Generator 2023, Pokemon Go Generator No Verification, Pokemon Go Generator Ios, Pokemon Go Generator Android, Pokemon Go Codes, Pokemon Go Resource Generator and Pokemon Go Pokecoins for Free that you can use.<br />
<br />
The most common type of Cheat is the “Unlimited Pokecoins” Cheat. This Cheat allows you to get unlimited Pokecoins, resources, and rare items in the game. All you need to do is enter your details, such as the game name, your username and password, and other information. After that, the Cheat will generate the Pokecoins for you.<br />
<br />
Another type of Cheat is the “Unlimited Resources” Cheat. This Cheat allows you to get unlimited resources, such as coins, gems, and other items in the game. All you need to do is enter your details, such as the game name, your username and password, and other information. After that, the Cheat will generate the resources for you.<br />
<br />
Finally, the last type of Cheat is the “Unlock All Items” Cheat. This Cheat allows you to unlock all of the items in the game. All you need to do is enter your details, such as the game name, your username and password, and other information. After that, the Cheat will generate the items for you.<br />
<br />
Now that you know about the different types of Generators, Cheats and Hacks, it’s time to find out how to use them. Luckily, most of these tools are very easy to use. All you need to do is enter your details, such as the game name, your username and password, and other information. After that, the tool will generate the items or resources for you.<br />
<br />
We hope that this guide has helped you understand the different types of Pokemon Go Pokecoins Generator, Pokemon Go Pokecoins Cheats, Pokemon Go Pokecoins Hack, Pokemon Go Generator 2023, Pokemon Go Generator No Verification, Pokemon Go Generator Ios, Pokemon Go Generator Android, Pokemon Go Codes, Pokemon Go Resource Generator and Pokemon Go Pokecoins for Free. Now you can enjoy playing Pokemon Go without any worries and get the most out of it!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Unlimited-)_Free_Merge_Dragons_Gems_Generator_2023_Updates_No_Human_Verification
(Unlimited-) Free Merge Dragons Gems Generator 2023 Updates No Human Verification
2023-02-22T20:41:10Z
<p>Skyts0401: Created page with "Are you looking for a Merge Dragons Gems Generator, Merge Dragons Gems Cheats, Merge Dragons Gems Hack, Merge Dragons Generator 2023, Merge Dragons Generator No Verification, ..."</p>
<hr />
<div>Are you looking for a Merge Dragons Gems Generator, Merge Dragons Gems Cheats, Merge Dragons Gems Hack, Merge Dragons Generator 2023, Merge Dragons Generator No Verification, Merge Dragons Generator Ios, Merge Dragons Generator Android, Merge Dragons Codes, Merge Dragons Resource Generator, Merge Dragons Gems For Free?<br />
<br />
If you are a fan of Merge Dragons, then you have come to the right place! We have all the answers for you!<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/32329b2<br />
<br />
<br />
Merge Dragons is a popular game for iOS and Android. It is exciting and interactive, and it allows you to build your own unique dragon kingdom. But, as with most games, it also has its own currency, which is Gems.<br />
<br />
Gems are necessary for you to progress in the game and to make your dragon kingdom the best it can be. But they can be hard to come by and can be expensive to buy. That’s why so many players are looking for a Merge Dragons Gems Generator, Merge Dragons Gems Cheats, Merge Dragons Gems Hack, Merge Dragons Generator 2023, Merge Dragons Generator No Verification, Merge Dragons Generator Ios, Merge Dragons Generator Android, Merge Dragons Codes, Merge Dragons Resource Generator, and Merge Dragons Gems For Free.<br />
<br />
These tools are designed to provide players with free Gems and resources. All you need to do is enter your Merge Dragons account information and you can get free Gems and resources quickly and easily. Plus, the newest Merge Dragons Generators are completely safe and secure to use. So there’s no need to worry about your account being compromised.<br />
<br />
The best thing about Merge Dragons Gems Generator, Merge Dragons Gems Cheats, Merge Dragons Gems Hack, Merge Dragons Generator 2023, Merge Dragons Generator No Verification, Merge Dragons Generator Ios, Merge Dragons Generator Android, Merge Dragons Codes, Merge Dragons Resource Generator, and Merge Dragons Gems For Free is that it is completely free and easy to use. All you need to do is enter your Merge Dragons account information and you can get free Gems and resources in no time. So what are you waiting for? Get your free Gems and resources and start building your very own dragon kingdom today!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/Free_Dragon_City_Gems_(2023)_Generator_No_Human_Verification_No_Survey
Free Dragon City Gems (2023) Generator No Human Verification No Survey
2023-02-22T20:39:22Z
<p>Skyts0401: Created page with "Are you a fan of Dragon City? Are you looking for ways to get more gems and resources to help you level up faster? Do you want to maximize your gaming experience? If so, you..."</p>
<hr />
<div>Are you a fan of Dragon City? Are you looking for ways to get more gems and resources to help you level up faster? Do you want to maximize your gaming experience? If so, you’ve come to the right place!<br />
<br />
Introducing the Dragon City Gems Generator, Dragon City Gems Cheats, Dragon City Gems Hack, Dragon City Generator 2023, Dragon City Generator No Verification, Dragon City Generator Ios, Dragon City Generator Android, Dragon City Codes, Dragon City Resource Generator and Dragon City Gems For Free. All of these tools are designed to help you get the most out of your Dragon City gaming experience.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/532541c<br />
<br />
<br />
Dragon City Gems Generator is a powerful tool that can be used to generate free, unlimited amounts of gems for your Dragon City account. This generator is easy to use and can be used on both iOS and Android devices. It is designed to be safe, secure and reliable, so you can be sure that your account won’t be compromised. With the Dragon City Gems Generator, you’ll be able to generate enough gems to give you a huge advantage over your opponents.<br />
<br />
Dragon City Gems Cheats and Hack allow you to get access to powerful cheats and hacks that can give you an edge in your game. These cheats and hacks are designed to help you get the most out of your gaming experience. With the Dragon City Gems Cheats and Hack, you can get unlimited resources, upgrade your dragons and even cheat your way to the top of the leaderboards.<br />
<br />
Dragon City Generator 2023 is a powerful tool that can be used to generate unlimited amounts of resources for your Dragon City account. This generator is designed to be safe, secure and reliable, so you can be sure your account won’t be compromised. With the Dragon City Generator 2023, you’ll be able to generate enough resources to give you a huge advantage over your opponents.<br />
<br />
Dragon City Generator No Verification allows you to get access to powerful cheats and hacks without having to verify your account. This generator is designed to be safe, secure and reliable, so you can be sure your account won’t be compromised. With the Dragon City Generator No Verification, you can get unlimited resources, upgrade your dragons and even cheat your way to the top of the leaderboards.<br />
<br />
Dragon City Generator Ios and Android are powerful tools that can be used to generate unlimited amounts of resources for your Dragon City account. These generators are designed to be safe, secure and reliable, so you can be sure your account won’t be compromised. With the Dragon City Generator Ios and Android, you can get unlimited resources, upgrade your dragons and even cheat your way to the top of the leaderboards.<br />
<br />
Dragon City Codes allow you to get access to powerful cheats and hacks without having to verify your account. These codes are designed to help you get the most out of your gaming experience. With the Dragon City Codes, you can get unlimited resources, upgrade your dragons and even cheat your way to the top of the leaderboards.<br />
<br />
Finally, the Dragon City Resource Generator is a powerful tool that can be used to generate unlimited amounts of resources for your Dragon City account. This generator is designed to be safe, secure and reliable, so you can be sure your account won’t be compromised. With the Dragon City Resource Generator, you can get unlimited resources, upgrade your dragons and even cheat your way to the top of the leaderboards.<br />
<br />
So what are you waiting for? Get your hands on the Dragon City Gems Generator, Dragon City Gems Cheats, Dragon City Gems Hack, Dragon City Generator 2023, Dragon City Generator No Verification, Dragon City Generator Ios, Dragon City Generator Android, Dragon City Codes, Dragon City Resource Generator and Dragon City Gems For Free and take your gaming experience to the next level!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/New.updated_Rainbow_Six_Siege_R6_Generator_2023_Free_No_Verification_Free
New.updated Rainbow Six Siege R6 Generator 2023 Free No Verification Free
2023-02-22T20:37:00Z
<p>Skyts0401: Created page with "Welcome to the world of Rainbow Six Siege R6 Generator! With the latest 2023 edition of Rainbow Six Siege R6, you can now access all of the cheats, hacks, and resources you ne..."</p>
<hr />
<div>Welcome to the world of Rainbow Six Siege R6 Generator! With the latest 2023 edition of Rainbow Six Siege R6, you can now access all of the cheats, hacks, and resources you need to get the most out of your gaming experience. Whether you’re looking for a way to get free resources or just want to explore the many options available to you, the Rainbow Six Siege R6 Generator is the perfect tool for you.<br />
<br />
With the Rainbow Six Siege R6 Generator, you can generate unlimited amounts of in-game currency that can be used to purchase weapons, upgrades, and skins. You can also generate resources and codes that can be used to unlock special items and upgrades. Plus, you can use the Rainbow Six Siege R6 Generator to generate free Rainbow Six Siege R6 codes to redeem for exclusive in-game items and more.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/d846715<br />
<br />
<br />
The Rainbow Six Siege R6 Generator is an easy-to-use and secure tool that provides 24/7 access to all the cheats and hacks you need to get the most out of your gaming experience. With the newest edition of Rainbow Six Siege R6, you can access all the resources you need with just a few clicks. All you have to do is connect to the internet and start generating resources. You will be able to find all the information you need in a few simple steps.<br />
<br />
The Rainbow Six Siege R6 Generator is designed with both beginner and experienced players in mind. With the user-friendly interface and easy-to-follow instructions, you can quickly and easily generate unlimited resources and codes. Plus, you can enjoy the additional benefits of using this tool such as the ability to access new features, bonuses, and exclusive rewards.<br />
<br />
Whether you’re looking for cheats, hacks, or resources, the Rainbow Six Siege R6 Generator can provide you with everything you need to get the most out of your gaming experience. With the 2023 edition of Rainbow Six Siege R6, you can now unlock all the cheats, hacks, and resources you need to get the most out of your gaming experience. Enjoy the benefits of using this tool to get the most out of your gaming experience and become the ultimate Rainbow Six Siege R6 champion!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/Free_Board_Kings_Generator_No_Human_Verification_No_Survey_(2023_Method)
Free Board Kings Generator No Human Verification No Survey (2023 Method)
2023-02-22T20:33:27Z
<p>Skyts0401: Created page with "Welcome to the Board Kings Gems Generator! Are you looking for the ultimate way to get free gems in Board Kings? Well, look no further! Our Board Kings Gems Generator is one o..."</p>
<hr />
<div>Welcome to the Board Kings Gems Generator! Are you looking for the ultimate way to get free gems in Board Kings? Well, look no further! Our Board Kings Gems Generator is one of the most popular and advanced gems generators available on the internet.<br />
<br />
We understand how difficult it can be to find Board Kings gems and resources. That's why we've created the Board Kings Gems Generator, which is designed to make obtaining gems and resources in Board Kings easier than ever. With a few simple clicks, you can generate unlimited amounts of gems for free.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/1c73066<br />
<br />
<br />
The Board Kings Gems Generator is one of the most advanced and secure gems generators available today. It works on all platforms, including Android, iOS, and PC. It's easy to use and requires no verification or downloads. All you need to do is enter your Board Kings username and hit the generate button. Within minutes, you'll have your gems and resources available for you to use.<br />
<br />
The Board Kings Gems Generator is constantly updated with new features and updates to ensure it's always working and secure. We also offer a variety of other helpful tools and resources such as Board Kings Codes and the Board Kings Resource Generator. With these tools, you can easily get access to exclusive resources and bonus items in Board Kings.<br />
<br />
We hope our Board Kings Gems Generator helps you get the most out of your Board Kings experience. With our generator, you can get unlimited amounts of gems and resources for free and without any hassle. If you have any questions or feedback, please feel free to contact us. We're always here to help!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Unlimited-free)_Moviestarplanet_Diamonds_Starcoins_Generator_2023_Updates_No_Human_Verification
(Unlimited-free) Moviestarplanet Diamonds Starcoins Generator 2023 Updates No Human Verification
2023-02-22T20:30:13Z
<p>Skyts0401: Created page with "Are you a fan of the popular online game Moviestarplanet? Have you been looking for a way to get your hands on diamond and starcoins without having to spend your hard-earned m..."</p>
<hr />
<div>Are you a fan of the popular online game Moviestarplanet? Have you been looking for a way to get your hands on diamond and starcoins without having to spend your hard-earned money? If so, you’re going to love the Moviestarplanet Diamonds Starcoins Generator, Moviestarplanet Diamonds Starcoins Cheats, Moviestarplanet Diamonds Starcoins Hack, Moviestarplanet Generator 2023, Moviestarplanet Generator No Verification, Moviestarplanet Generator Ios, Moviestarplanet Generator Android, Moviestarplanet Codes, Moviestarplanet Resource Generator, and Moviestarplanet Diamonds Starcoins For Free! <br />
<br />
The Moviestarplanet Diamonds Starcoins Generator is a great tool that allows you to generate and hack your way to unlimited amounts of diamond and starcoins in one of the world’s most popular online games! This generator is incredibly easy to use and takes only a few minutes to get you the resources you need. Plus, it's completely free and safe to use, so you can be sure that your account is safe from malicious attacks. <br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/0fa95a7<br />
<br />
<br />
The Moviestarplanet Diamonds Starcoins Cheats and Hack are also great tools that will help you get the edge in the game. With these cheats and hacks, you can gain access to powerful items, weapons, and even coins that you wouldn't have access to otherwise. Plus, they are also 100% undetectable, so you can be sure that your account will remain safe. <br />
<br />
The Moviestarplanet Generator 2023 is another great tool that will help you get the most out of your game. This generator can generate a large number of coins and diamonds at once, giving you access to powerful items and weapons. Plus, it's completely free and safe to use, so you can be sure that your account will remain safe. <br />
<br />
The Moviestarplanet Generator No Verification is also a great tool that can help you get diamonds and starcoins without having to go through the hassle of verifying your account. This generator is also free and safe to use, so you can be sure that your account will remain safe from malicious attacks. <br />
<br />
The Moviestarplanet Generator Ios and Android are two great tools that can help you get the most out of the game on your mobile devices. These generators are incredibly easy to use and can generate tons of coins and diamonds in a matter of minutes. Plus, they are both free and safe to use, so you can be sure that your account is safe. <br />
<br />
The Moviestarplanet Codes and Resource Generator are also great tools that can help you get the most out of your game. These codes are incredibly easy to use and can generate tons of coins and diamonds in a matter of minutes. Plus, they are both free and safe to use, so you can be sure that your account is safe. <br />
<br />
Finally, the Moviestarplanet Diamonds Starcoins For Free is a great tool that will help you get the most out of the game without spending any money. This generator is incredibly easy to use and can generate tons of diamonds and starcoins in a matter of minutes. Plus, it's completely free and safe to use, so you can be sure that your account is safe. <br />
<br />
So if you're looking for a way to get your hands on diamond and starcoins without having to spend your hard-earned money, the Moviestarplanet Diamonds Starcoins Generator, Moviestarplanet Diamonds Starcoins Cheats, Moviestarplanet Diamonds Starcoins Hack, Moviestarplanet Generator 2023, Moviestarplanet Generator No Verification, Moviestarplanet Generator Ios, Moviestarplanet Generator Android, Moviestarplanet Codes, Moviestarplanet Resource Generator, and Moviestarplanet Diamonds Starcoins For Free are all great tools that you should check out!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Easy-working)_Star_Stable_Stars_Generator_2023_New_Updated
(Easy-working) Star Stable Stars Generator 2023 New Updated
2023-02-22T20:29:04Z
<p>Skyts0401: Created page with "Are you a fan of the immensely popular online game, Star Stable? Then you are in luck! We are proud to introduce the Star Stable Star Coins Jorvik Coins Generator, Star Stable..."</p>
<hr />
<div>Are you a fan of the immensely popular online game, Star Stable? Then you are in luck! We are proud to introduce the Star Stable Star Coins Jorvik Coins Generator, Star Stable Star Coins Jorvik Coins Cheats, Star Stable Star Coins Jorvik Coins Hack, Star Stable Generator 2023, Star Stable Generator No Verification, Star Stable Generator Ios, Star Stable Generator Android, Star Stable Codes, Star Stable Resource Generator, Star Stable Star Coins Jorvik Coins For Free!<br />
<br />
Star Stable is an incredibly engaging and challenging game that has been captivating its players for years. In this game, players take on the role of a horse rider and explore the magical world of Jorvik. As you progress through the game, you will acquire various resources and items which can be used to upgrade and customize your character.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/7c8cc0c<br />
<br />
<br />
However, acquiring the necessary resources can be quite time-consuming and challenging. This is where the Star Stable Star Coins Jorvik Coins Generator comes in! With this amazing tool, you can generate unlimited amounts of Star Coins and Jorvik Coins to use in the game. This generator is also equipped with a variety of cheats, hacks, and codes to make your gaming experience even more exciting.<br />
<br />
The Star Stable Star Coins Jorvik Coins Generator is also equipped with advanced security measures, such as anti-ban and anti-spam protection. This ensures that your account remains safe and secure, and your personal information is kept confidential. The generator also comes with a no-verification feature, which means you don’t have to verify your account details when using the generator.<br />
<br />
The Star Stable Star Coins Jorvik Coins Generator is also compatible with both iOS and Android devices. This means that you can generate the required resources from the convenience of your own device.<br />
<br />
The Star Stable Star Coins Jorvik Coins Generator is incredibly user-friendly and easy to use. All you need to do is enter your Star Stable account details, choose the desired amount of Star Coins and Jorvik Coins and click “Generate”. In a few moments, you will have the desired amount of resources, ready to be used in the game.<br />
<br />
If you are a Star Stable fan, then the Star Stable Star Coins Jorvik Coins Generator is a must-have tool! With this amazing resource generator, you can generate unlimited amounts of Star Coins and Jorvik Coins, as well as a variety of cheats, hacks, and codes for a more exciting gaming experience. Try it today and start exploring the magical world of Jorvik!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/Free.method_Star_Stable_Star_Coins_Jorvik_Coins_Generator_Free_2023_No_Human_Verification
Free.method Star Stable Star Coins Jorvik Coins Generator Free 2023 No Human Verification
2023-02-22T20:27:10Z
<p>Skyts0401: Created page with "Are you a fan of the immensely popular online game, Star Stable? Then you are in luck! We are proud to introduce the Star Stable Star Coins Jorvik Coins Generator, Star Stable..."</p>
<hr />
<div>Are you a fan of the immensely popular online game, Star Stable? Then you are in luck! We are proud to introduce the Star Stable Star Coins Jorvik Coins Generator, Star Stable Star Coins Jorvik Coins Cheats, Star Stable Star Coins Jorvik Coins Hack, Star Stable Generator 2023, Star Stable Generator No Verification, Star Stable Generator Ios, Star Stable Generator Android, Star Stable Codes, Star Stable Resource Generator, Star Stable Star Coins Jorvik Coins For Free!<br />
<br />
Star Stable is an incredibly engaging and challenging game that has been captivating its players for years. In this game, players take on the role of a horse rider and explore the magical world of Jorvik. As you progress through the game, you will acquire various resources and items which can be used to upgrade and customize your character.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/7c8cc0c<br />
<br />
<br />
However, acquiring the necessary resources can be quite time-consuming and challenging. This is where the Star Stable Star Coins Jorvik Coins Generator comes in! With this amazing tool, you can generate unlimited amounts of Star Coins and Jorvik Coins to use in the game. This generator is also equipped with a variety of cheats, hacks, and codes to make your gaming experience even more exciting.<br />
<br />
The Star Stable Star Coins Jorvik Coins Generator is also equipped with advanced security measures, such as anti-ban and anti-spam protection. This ensures that your account remains safe and secure, and your personal information is kept confidential. The generator also comes with a no-verification feature, which means you don’t have to verify your account details when using the generator.<br />
<br />
The Star Stable Star Coins Jorvik Coins Generator is also compatible with both iOS and Android devices. This means that you can generate the required resources from the convenience of your own device.<br />
<br />
The Star Stable Star Coins Jorvik Coins Generator is incredibly user-friendly and easy to use. All you need to do is enter your Star Stable account details, choose the desired amount of Star Coins and Jorvik Coins and click “Generate”. In a few moments, you will have the desired amount of resources, ready to be used in the game.<br />
<br />
If you are a Star Stable fan, then the Star Stable Star Coins Jorvik Coins Generator is a must-have tool! With this amazing resource generator, you can generate unlimited amounts of Star Coins and Jorvik Coins, as well as a variety of cheats, hacks, and codes for a more exciting gaming experience. Try it today and start exploring the magical world of Jorvik!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(New_Update)_Simcity_Buildit_Simoleons_Simcash_Generator_2023_Free_No_Verification_No_Survey
(New Update) Simcity Buildit Simoleons Simcash Generator 2023 Free No Verification No Survey
2023-02-22T20:24:04Z
<p>Skyts0401: Created page with "Are you looking for a way to get your hands on Simcity Buildit Simoleons Simcash without having to pay for it? If so, then you should check out the Simcity Buildit Simoleons S..."</p>
<hr />
<div>Are you looking for a way to get your hands on Simcity Buildit Simoleons Simcash without having to pay for it? If so, then you should check out the Simcity Buildit Simoleons Simcash Generator, Simcity Buildit Simoleons Simcash Cheats, Simcity Buildit Simoleons Simcash Hack, Simcity Buildit Generator 2023, Simcity Buildit Generator No Verification, Simcity Buildit Generator Ios, Simcity Buildit Generator Android, Simcity Buildit Codes, Simcity Buildit Resource Generator, and Simcity Buildit Simoleons Simcash For Free.<br />
<br />
All of these tools are designed to help you get the most out of your Simcity Buildit experience. Whether you're just starting out or have been playing the game for a while, these tools will help you get ahead and make the most of your gaming experience.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/ea2cb79<br />
<br />
<br />
The Simcity Buildit Simoleons Simcash Generator is an easy-to-use tool that allows you to generate an unlimited amount of Simcash, which you can use to purchase items in the game. You don't have to worry about verification or any other time-consuming processes; simply input your chosen amount and the generator will do the rest.<br />
<br />
The Simcity Buildit Simoleons Simcash Cheats are designed to help you cheat your way through the game, allowing you to get unlimited Simcash without having to spend any of your own money. This can be particularly helpful if you want to upgrade your city to the next level without having to pay for it.<br />
<br />
The Simcity Buildit Simoleons Simcash Hack is designed to give you an advantage over other players by unlocking additional features in the game. This way, you can take your Simcity experience to the next level without having to spend any of your own money.<br />
<br />
The Simcity Buildit Generator 2023 is the most up-to-date version of the Simcity Buildit tool available, allowing you to generate an unlimited amount of Simcash. This is perfect for those who want to upgrade their city quickly, allowing them to get the most out of their gaming experience.<br />
<br />
The Simcity Buildit Generator No Verification is also a great tool, allowing you to bypass any time-consuming verification processes. This makes it much easier to get the most out of your Simcity experience, and it's incredibly useful for those who want to upgrade their city quickly.<br />
<br />
The Simcity Buildit Generator Ios and Simcity Buildit Generator Android tools are designed for those who want to take their Simcity experience further by playing on their mobile devices. Both tools are incredibly easy to use and allow you to generate an unlimited amount of Simcash.<br />
<br />
The Simcity Buildit Codes are designed to help you get the most out of your Simcity experience. These codes can be used to get exclusive items or discounts on items, allowing you to get the most out of your gaming experience.<br />
<br />
The Simcity Buildit Resource Generator is a powerful tool that allows you to generate an unlimited amount of resources, allowing you to get the most out of your Simcity experience. This allows you to upgrade your city faster than ever before, without having to spend any of your own money.<br />
<br />
Finally, the Simcity Buildit Simoleons Simcash For Free tool is designed to help you get the most out of your Simcity experience without having to spend any of your own money. This is an easy-to-use tool that allows you to generate an unlimited amount of Simcash without having to pay for it.<br />
<br />
So, if you're looking for a way to get the most out of your Simcity experience without having to pay for it, then check out all these tools for Simcity Buildit Simoleons Simcash today!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/New_Working_Hack_Marvel_Future_Fight_Gold_Crystals_Generator_2023_Cheats_No_Human_Verification_Tested_On_Android_Ios
New Working Hack Marvel Future Fight Gold Crystals Generator 2023 Cheats No Human Verification Tested On Android Ios
2023-02-22T20:18:40Z
<p>Skyts0401: Created page with "Are you looking for a reliable way to get unlimited Marvel Future Fight Gold Crystals? Look no further! Our Marvel Future Fight Gold Crystals Generator, Marvel Future Fight Go..."</p>
<hr />
<div>Are you looking for a reliable way to get unlimited Marvel Future Fight Gold Crystals? Look no further! Our Marvel Future Fight Gold Crystals Generator, Marvel Future Fight Gold Crystals Cheats, and Marvel Future Fight Gold Crystals Hack are the perfect tools to help you get the most out of your Marvel Future Fight experience.<br />
<br />
The Marvel Future Fight Generator 2023, Marvel Future Fight Generator No Verification, Marvel Future Fight Generator Ios, and Marvel Future Fight Generator Android are the most advanced tools available to help you get the most out of your Marvel Future Fight gaming experience. Not only will these tools help you get the most out of your gaming experience, but they will also help you save time and money. With these tools, you don’t need to worry about investing in costly upgrades or spending real money to get the items you need. Instead, you can use the Marvel Future Fight Codes, Marvel Future Fight Resource Generator, and Marvel Future Fight Gold Crystals for free to quickly and easily get the items you need in the game.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/92609b9<br />
<br />
<br />
The Marvel Future Fight Gold Crystals Generator, Marvel Future Fight Gold Crystals Cheats, and Marvel Future Fight Gold Crystals Hack are the perfect tools to help you get the most out of your Marvel Future Fight gaming experience. These tools are designed to be easy to use and are constantly updated with new content, so you can always stay on top of the latest trends in the game. With these tools, you can effortlessly get the most out of your Marvel Future Fight gaming experience. Whether you’re just starting out in the game or you’ve been playing for years, these tools will help you get the most out of your gaming experience.<br />
<br />
So, if you’re looking for an easy way to get the most out of your Marvel Future Fight gaming experience, be sure to check out our Marvel Future Fight Gold Crystals Generator, Marvel Future Fight Gold Crystals Cheats, and Marvel Future Fight Gold Crystals Hack. With these tools, you can quickly and easily get the most out of your Marvel Future Fight gaming experience. And, with the Marvel Future Fight Codes, Marvel Future Fight Resource Generator, and Marvel Future Fight Gold Crystals for free, you’ll have everything you need to get the most out of your gaming experience. So start using these tools today and get the most out of your Marvel Future Fight gaming experience!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Premium)_Hempire_Apk_Hack_Latest_Version_2023_New_Generator_Cash_Diamonds_Cheats_For_Free
(Premium) Hempire Apk Hack Latest Version 2023 New Generator Cash Diamonds Cheats For Free
2023-02-22T20:15:00Z
<p>Skyts0401: Created page with "Are you looking for a way to get free Hempire Cash Diamonds? If so, then you’ve come to the right place! Our Hempire Cash Diamonds Generator, Hempire Cash Diamonds Cheats, H..."</p>
<hr />
<div>Are you looking for a way to get free Hempire Cash Diamonds? If so, then you’ve come to the right place! Our Hempire Cash Diamonds Generator, Hempire Cash Diamonds Cheats, Hempire Cash Diamonds Hack, Hempire Generator 2023, Hempire Generator No Verification, Hempire Generator Ios, Hempire Generator Android, Hempire Codes, Hempire Resource Generator, and Hempire Cash Diamonds For Free are all here to help you get free Hempire Cash Diamonds!<br />
<br />
Hempire Cash Diamonds are used in the Hempire game to buy upgrades, new buildings, and other items. Without them, it can be difficult to progress in the game. This is why we have created our Hempire Cash Diamonds Generator, Hempire Cash Diamonds Cheats, Hempire Cash Diamonds Hack, Hempire Generator 2023, Hempire Generator No Verification, Hempire Generator Ios, Hempire Generator Android, Hempire Codes, Hempire Resource Generator, and Hempire Cash Diamonds For Free!<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/aef8c44<br />
<br />
<br />
Our Hempire Cash Diamonds Generator will help you generate as many Hempire Cash Diamonds as you need for your game. The generator is easy to use and requires no verification or additional downloads. Just enter your username and select the number of Hempire Cash Diamonds you want. With just a few clicks, you’ll have your Hempire Cash Diamonds in no time!<br />
<br />
Our Hempire Cash Diamonds Cheats and Hempire Cash Diamonds Hack are designed to help you get the most out of your Hempire game. These cheats and hacks will give you access to a wide range of items, power-ups, and bonuses that you wouldn’t be able to get otherwise. With the help of our cheats and hacks, you’ll be sure to advance your game quickly and easily.<br />
<br />
The Hempire Generator 2023, Hempire Generator No Verification, Hempire Generator Ios, Hempire Generator Android, Hempire Codes, Hempire Resource Generator, and Hempire Cash Diamonds For Free are all available to help you get the most out of your Hempire game. With these helpful tools, you’ll be able to get the most out of your Hempire game and get the most out of your Hempire Cash Diamonds!<br />
<br />
So, if you’re looking for a way to get Hempire Cash Diamonds for free, then look no further than our Hempire Cash Diamonds Generator, Hempire Cash Diamonds Cheats, Hempire Cash Diamonds Hack, Hempire Generator 2023, Hempire Generator No Verification, Hempire Generator Ios, Hempire Generator Android, Hempire Codes, Hempire Resource Generator, and Hempire Cash Diamonds For Free! With these helpful tools, you’ll be sure to get the most out of your Hempire game and get the most out of your Hempire Cash Diamonds!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Unlimited_Cheat)**free_Stumble_Guys_Gems_Generator_2023_New_Year_Version_2023
(Unlimited Cheat)**free Stumble Guys Gems Generator 2023 New Year Version 2023
2023-02-22T20:11:06Z
<p>Skyts0401: Created page with " Are you looking for a way to get Stumble Guys Gems for free? If so, you're in luck! We have the perfect solution with our Stumble Guys Gems Generator. Whether you're playing ..."</p>
<hr />
<div> Are you looking for a way to get Stumble Guys Gems for free? If so, you're in luck! We have the perfect solution with our Stumble Guys Gems Generator. Whether you're playing on an iOS, Android, or PC device, our Stumble Guys Gems Generator can provide you with free gems in no time.<br />
<br />
Our Stumble Guys Generator is the quickest and easiest way to get unlimited gems. The generator requires no verification, so you don't need to worry about getting banned or putting your account at risk. Furthermore, it's available for use on all devices, so you can get free gems regardless of your device type.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/96ba3b6<br />
<br />
<br />
Using the Stumble Guys Generator is easy. All you have to do is enter your username, select the amount of gems you'd like to generate, and click “Generate”. Within minutes, you'll have all the gems you need to upgrade your player and progress further.<br />
<br />
We understand that you may be hesitant to use a Stumble Guys Generator. That's why we've created our own Stumble Guys Cheats and Hacks. Our cheats and hacks are designed to help you get the most out of the game and can provide you with an advantage over other players. With our cheats and hacks, you can get unlimited gems and resources at no cost.<br />
<br />
If you're looking for a way to get free gems in Stumble Guys, our Stumble Guys Resource Generator is the perfect solution. The generator is updated regularly with new codes and can provide you with all the resources you need to get ahead.<br />
<br />
If you want to get Stumble Guys Gems for free, our Stumble Guys Gems Generator is the perfect solution. The generator is completely safe to use and requires no verification, so you don't need to worry about getting banned or putting your account at risk. Furthermore, it can be used on all devices, so you can get free gems regardless of your device type.<br />
<br />
We hope that this article has helped you understand how our Stumble Guys Gems Generator can help you get free gems and resources for Stumble Guys. If you have any questions or feedback, please don't hesitate to contact us. We look forward to helping you enjoy Stumble Guys even more!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Real_Generator)_Marvel_Strike_Force_Gold_Orbs_Generator_Hack_Cheats_2023_For_Android_Ios_No_Human_Verification
(Real Generator) Marvel Strike Force Gold Orbs Generator Hack Cheats 2023 For Android Ios No Human Verification
2023-02-22T20:04:19Z
<p>Skyts0401: Created page with "Are you a Marvel Strike Force fan looking to get more gold orbs, resources, and codes? If so, you’re in luck! Marvel Strike Force Gold Orbs Generator, Marvel Strike Force Go..."</p>
<hr />
<div>Are you a Marvel Strike Force fan looking to get more gold orbs, resources, and codes? If so, you’re in luck! Marvel Strike Force Gold Orbs Generator, Marvel Strike Force Gold Orbs Cheats, Marvel Strike Force Gold Orbs Hack, Marvel Strike Force Generator 2023, Marvel Strike Force Generator No Verification, Marvel Strike Force Generator Ios, Marvel Strike Force Generator Android, Marvel Strike Force Codes, Marvel Strike Force Resource Generator, and Marvel Strike Force Gold Orbs For Free are all here to make your gaming experience easier and more enjoyable.<br />
<br />
Marvel Strike Force Gold Orbs Generator is the perfect way to get more gold orbs and resources in the game. Whether you’re looking to upgrade your gear, purchase special abilities, or just get more out of your game, this generator has you covered. This generator is completely safe to use and is updated regularly with new codes and resources. With the Marvel Strike Force Gold Orbs Generator, you can easily generate the exact amount of gold orbs and resources you need for whatever task you want to accomplish.<br />
<br />
<br />
CLICK HERE TO GET FREE:: https://cheats.tips/fdeb280<br />
<br />
<br />
Marvel Strike Force Gold Orbs Cheats is another great tool that can help you get ahead in the game. With this cheat, you can easily access and use codes and resources for free. This cheat also updates regularly with new and updated codes, making sure you always have the edge over your opponents. You can also use the cheat to get yourself an edge in PvP combat and other game modes.<br />
<br />
Marvel Strike Force Gold Orbs Hack is a great way to get yourself more gold orbs in the game. This hack allows you to access and use codes and resources in the game for free. This hack also updates regularly with new and updated codes, making sure you always have the edge over your opponents.<br />
<br />
Marvel Strike Force Generator 2023 is the latest and greatest generator available for the game. This generator is loaded with new features that allow you to generate the exact amount of gold orbs and resources you need for whatever task you want to accomplish. With this generator, you can easily access and use codes and resources for free.<br />
<br />
Marvel Strike Force Generator No Verification is the perfect tool for gamers who don’t want to have to go through the hassle of verifying their accounts. This generator is completely safe to use and is updated regularly with new codes and resources. With this generator, you can easily generate the exact amount of gold orbs and resources you need for whatever task you want to accomplish.<br />
<br />
Marvel Strike Force Generator Ios and Marvel Strike Force Generator Android are two great tools that can make your gaming experience easier and more enjoyable. With these generators, you can easily access and use codes and resources for free. Both of these generators are updated regularly with new codes and resources, making sure you always have the edge over your opponents.<br />
<br />
Marvel Strike Force Codes and Marvel Strike Force Resource Generator are two more great tools that can make your gaming experience easier and more enjoyable. These generators allow you to access and use codes and resources for free. Both of these generators are updated regularly with new codes and resources, making sure you always have the edge over your opponents.<br />
<br />
Finally, Marvel Strike Force Gold Orbs For Free is a great way to get more gold orbs in the game. With this generator, you can easily generate the exact amount of gold orbs and resources you need for whatever task you want to accomplish. With this generator, you can easily access and use codes and resources for free.<br />
<br />
Whether you’re looking to upgrade your gear, purchase special abilities, or just get more out of your game, these generators have you covered. With Marvel Strike Force Gold Orbs Generator, Marvel Strike Force Gold Orbs Cheats, Marvel Strike Force Gold Orbs Hack, Marvel Strike Force Generator 2023, Marvel Strike Force Generator No Verification, Marvel Strike Force Generator Ios, Marvel Strike Force Generator Android, Marvel Strike Force Codes, Marvel Strike Force Resource Generator, and Marvel Strike Force Gold Orbs For Free, you can easily get the most out of your Marvel Strike Force game.</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/(Free)_Kim_Kardashian_Hollywood_Cash_Stars_Generator_2023_No_Human_Verification
(Free) Kim Kardashian Hollywood Cash Stars Generator 2023 No Human Verification
2023-02-22T19:58:43Z
<p>Skyts0401: Created page with "Are you looking for a way to get your hands on more Kim Kardashian Hollywood Cash Stars? If so, then you've come to the right place! Kim Kardashian Hollywood Cash Stars Gener..."</p>
<hr />
<div>Are you looking for a way to get your hands on more Kim Kardashian Hollywood Cash Stars? If so, then you've come to the right place!<br />
<br />
Kim Kardashian Hollywood Cash Stars Generator, Kim Kardashian Hollywood Cash Stars Cheats, Kim Kardashian Hollywood Cash Stars Hack, Kim Kardashian Hollywood Generator 2023, Kim Kardashian Hollywood Generator No Verification, Kim Kardashian Hollywood Generator Ios, Kim Kardashian Hollywood Generator Android, Kim Kardashian Hollywood Codes, Kim Kardashian Hollywood Resource Generator, and Kim Kardashian Hollywood Cash Stars For Free, are all great options for those looking to get more virtual currency in the game.<br />
<br />
<br />
CLICK HERE TO GET FREE:: http://www.helpmecheat.live/0590156<br />
<br />
<br />
The Kim Kardashian Hollywood Cash Stars Generator is a great choice for those looking for the quickest way to get their hands on more stars. This generator will allow you to generate unlimited amounts of stars, allowing you to purchase whatever you need in the game. The generator is easy to use and only requires a few clicks of the mouse to get started.<br />
<br />
If you're looking for something a bit more advanced, then the Kim Kardashian Hollywood Cash Stars Cheats is a great option. This cheat will allow you to manipulate the game in various ways, giving you the edge over your opponents. This cheat is also easy to use and can be found online.<br />
<br />
The Kim Kardashian Hollywood Cash Stars Hack is another great option for those looking to get ahead in the game. This hack will give you access to an unlimited amount of stars, allowing you to purchase whatever you need in the game. The hack is also easy to use and can be found online.<br />
<br />
Kim Kardashian Hollywood Generator 2023, Kim Kardashian Hollywood Generator No Verification, Kim Kardashian Hollywood Generator Ios, Kim Kardashian Hollywood Generator Android, Kim Kardashian Hollywood Codes, and Kim Kardashian Hollywood Resource Generator are all also great options for those looking to get more virtual currency in the game. These generators allow you to generate unlimited amounts of stars, allowing you to purchase whatever you need in the game. The generators are easy to use and can be found online.<br />
<br />
Finally, Kim Kardashian Hollywood Cash Stars For Free is another great option for those looking to get more virtual currency in the game. This generator will allow you to generate unlimited amounts of stars, allowing you to purchase whatever you need in the game. The generator is easy to use and can be found online.<br />
<br />
No matter which Kim Kardashian Hollywood Cash Stars Generator, Kim Kardashian Hollywood Cash Stars Cheats, Kim Kardashian Hollywood Cash Stars Hack, Kim Kardashian Hollywood Generator 2023, Kim Kardashian Hollywood Generator No Verification, Kim Kardashian Hollywood Generator Ios, Kim Kardashian Hollywood Generator Android, Kim Kardashian Hollywood Codes, Kim Kardashian Hollywood Resource Generator, or Kim Kardashian Hollywood Cash Stars For Free you choose, you can be sure that you will be able to get ahead in the game. All of these options are easy to use and can be found online. So, what are you waiting for? Get started today and start enjoying the game!</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/MaximaMaximaMaxima
MaximaMaximaMaxima
2023-02-22T16:44:17Z
<p>Skyts0401: Blanked the page</p>
<hr />
<div></div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/MaximaMaximaMaxima
MaximaMaximaMaxima
2023-02-22T16:44:12Z
<p>Skyts0401: Created page with "http://plantgenomics.snu.ac.kr/"</p>
<hr />
<div>http://plantgenomics.snu.ac.kr/</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-07-04T06:28:59Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
export PERL5LIB=$CEGMA/lib:$PERL5LIB<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
Preparation<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
(Add PATH(/data/skyts0401/program/maker/bin) to ~/.profile)<br />
ln -s ../assembly/Vradi.pacbio.gapfilled.final.fa .<br />
mkdir ../transcriptome<br />
cd ../transcriptome/<br />
scp skyts0401@147.46.250.244:/data/KangYJ/Mungbean/Transcriptome/merge/mungbean_merge.fa.cdhit.fa .<br />
cd ../maker/<br />
ln -s ../transcriptome/mungbean_merge.fa.cdhit.fa .<br />
mkdir ref<br />
cd ref/<br />
(download Fvesca annotation file from phytozome)<br />
unzip Fvesca_download.zip <br />
cd Fvesca/v1.1/annotation/<br />
gunzip Fvesca_226_v1.1.protein.fa.gz <br />
gunzip Fvesca_226_v1.1.transcript.fa.gz <br />
cd ../../..<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.protein.fa .<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.transcript .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Gmax*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Ptrichocarpa*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Vvinifera*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Osativa*.fa .<br />
cd ..<br />
ln -s ../repeatmask/ProtExclude/allRepeats.libnoProtFinal<br />
mkdir tmp<br />
<br />
<br />
Running<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
maker -CTL<br />
nano maker_bopts.ctl (default, check blast_type=ncbi+)<br />
nano maker_exe.ctl (change the path ncbi-blast+, RepeatMasker, exonerate, augustus)<br />
nano maker_opts.ctl (change the path genome, evidence(transcriptome, protein), repeat library, temporary directory)<br />
mpiexec -n 30 maker -fix_nucleotides maker_opts.ctl maker_bopts.ctl maker_exe.ctl >& maker_opts.ctl.log<br />
<br />
== 6/26 ==<br />
=== Mungbean pacbio assembly ===<br />
checking synteny block for chromosome split, combine<br />
<br />
<br />
blast<br />
(NICEM:~/data/Mungbean/blast)<br />
makeblastdb -in Vradi.ver6.cor.pep.fa -dbtype 'prot'<br />
blastall -i adzuki.ver3.pep.fa.tr.cor.fa -d Vradi.ver6.cor.pep.fa -p blastp -e 1e-10 -b 5 -v 5 -m 8 -o mcscanx/old_Va.blast<br />
# same procedure for other organism protein<br />
<br />
<br />
MCSanX<br />
(NICEM:~/data/Mungbean/blast/mcscanx)<br />
python gffcombine.py Vradi_ver6.gff.sorted.by.TY.gff adzuki.ver3.gene.gff.cor.gff > old_Va.gff<br />
~/data/program/MCScanX/MCScanX old_Va<br />
# same procedure for other organism protein, just change the species name in gffcombine.py and command<br />
<br />
<br />
Circos<br />
(193:/data2/skyts0401/Mungbean/synteny/circos/)<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf synteny_Va_gene.conf -outputfile synteny_Va_gene.png<br />
# same procedure for other organism, change configuration file</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-28T02:14:04Z
<p>Skyts0401: /* 6/26 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
Preparation<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
(Add PATH(/data/skyts0401/program/maker/bin) to ~/.profile)<br />
ln -s ../assembly/Vradi.pacbio.gapfilled.final.fa .<br />
mkdir ../transcriptome<br />
cd ../transcriptome/<br />
scp skyts0401@147.46.250.244:/data/KangYJ/Mungbean/Transcriptome/merge/mungbean_merge.fa.cdhit.fa .<br />
cd ../maker/<br />
ln -s ../transcriptome/mungbean_merge.fa.cdhit.fa .<br />
mkdir ref<br />
cd ref/<br />
(download Fvesca annotation file from phytozome)<br />
unzip Fvesca_download.zip <br />
cd Fvesca/v1.1/annotation/<br />
gunzip Fvesca_226_v1.1.protein.fa.gz <br />
gunzip Fvesca_226_v1.1.transcript.fa.gz <br />
cd ../../..<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.protein.fa .<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.transcript .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Gmax*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Ptrichocarpa*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Vvinifera*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Osativa*.fa .<br />
cd ..<br />
ln -s ../repeatmask/ProtExclude/allRepeats.libnoProtFinal<br />
mkdir tmp<br />
<br />
<br />
Running<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
maker -CTL<br />
nano maker_bopts.ctl (default, check blast_type=ncbi+)<br />
nano maker_exe.ctl (change the path ncbi-blast+, RepeatMasker, exonerate, augustus)<br />
nano maker_opts.ctl (change the path genome, evidence(transcriptome, protein), repeat library, temporary directory)<br />
mpiexec -n 30 maker -fix_nucleotides maker_opts.ctl maker_bopts.ctl maker_exe.ctl >& maker_opts.ctl.log<br />
<br />
== 6/26 ==<br />
=== Mungbean pacbio assembly ===<br />
checking synteny block for chromosome split, combine<br />
<br />
<br />
blast<br />
(NICEM:~/data/Mungbean/blast)<br />
makeblastdb -in Vradi.ver6.cor.pep.fa -dbtype 'prot'<br />
blastall -i adzuki.ver3.pep.fa.tr.cor.fa -d Vradi.ver6.cor.pep.fa -p blastp -e 1e-10 -b 5 -v 5 -m 8 -o mcscanx/old_Va.blast<br />
# same procedure for other organism protein<br />
<br />
<br />
MCSanX<br />
(NICEM:~/data/Mungbean/blast/mcscanx)<br />
python gffcombine.py Vradi_ver6.gff.sorted.by.TY.gff adzuki.ver3.gene.gff.cor.gff > old_Va.gff<br />
~/data/program/MCScanX/MCScanX old_Va<br />
# same procedure for other organism protein, just change the species name in gffcombine.py and command<br />
<br />
<br />
Circos<br />
(193:/data2/skyts0401/Mungbean/synteny/circos/)<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf synteny_Va_gene.conf -outputfile synteny_Va_gene.png<br />
# same procedure for other organism, change configuration file</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-26T05:52:49Z
<p>Skyts0401: /* 5/29 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
Preparation<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
(Add PATH(/data/skyts0401/program/maker/bin) to ~/.profile)<br />
ln -s ../assembly/Vradi.pacbio.gapfilled.final.fa .<br />
mkdir ../transcriptome<br />
cd ../transcriptome/<br />
scp skyts0401@147.46.250.244:/data/KangYJ/Mungbean/Transcriptome/merge/mungbean_merge.fa.cdhit.fa .<br />
cd ../maker/<br />
ln -s ../transcriptome/mungbean_merge.fa.cdhit.fa .<br />
mkdir ref<br />
cd ref/<br />
(download Fvesca annotation file from phytozome)<br />
unzip Fvesca_download.zip <br />
cd Fvesca/v1.1/annotation/<br />
gunzip Fvesca_226_v1.1.protein.fa.gz <br />
gunzip Fvesca_226_v1.1.transcript.fa.gz <br />
cd ../../..<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.protein.fa .<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.transcript .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Gmax*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Ptrichocarpa*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Vvinifera*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Osativa*.fa .<br />
cd ..<br />
ln -s ../repeatmask/ProtExclude/allRepeats.libnoProtFinal<br />
mkdir tmp<br />
<br />
<br />
Running<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
maker -CTL<br />
nano maker_bopts.ctl (default, check blast_type=ncbi+)<br />
nano maker_exe.ctl (change the path ncbi-blast+, RepeatMasker, exonerate, augustus)<br />
nano maker_opts.ctl (change the path genome, evidence(transcriptome, protein), repeat library, temporary directory)<br />
mpiexec -n 30 maker -fix_nucleotides maker_opts.ctl maker_bopts.ctl maker_exe.ctl >& maker_opts.ctl.log<br />
<br />
== 6/26 ==<br />
=== Mungbean pacbio assembly ===<br />
checking synteny block for chromosome split, combine<br />
<br />
<br />
blast<br />
(NICEM:~/data/Mungbean/blast)<br />
makeblastdb -in Vradi.ver6.cor.pep.fa -dbtype 'prot'<br />
blastall -i adzuki.ver3.pep.fa.tr.cor.fa -d Vradi.ver6.cor.pep.fa -p blastp -e 1e-10 -b 5 -v 5 -m 8 -o mcscanx/old_Va.blast<br />
# same procedure for other organism protein<br />
<br />
<br />
MCSanX<br />
(NICEM:~/data/Mungbean/blast/mcscanx)<br />
python gffcombine.py Vradi_ver6.gff.sorted.by.TY.gff adzuki.ver3.gene.gff.cor.gff > old_Va.gff<br />
~/data/program/MCScanX/MCScanX old_Va<br />
# same procedure for other organism protein, just change the species name in gffcombine.py and command</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-09T01:53:40Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
Preparation<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
(Add PATH(/data/skyts0401/program/maker/bin) to ~/.profile)<br />
ln -s ../assembly/Vradi.pacbio.gapfilled.final.fa .<br />
mkdir ../transcriptome<br />
cd ../transcriptome/<br />
scp skyts0401@147.46.250.244:/data/KangYJ/Mungbean/Transcriptome/merge/mungbean_merge.fa.cdhit.fa .<br />
cd ../maker/<br />
ln -s ../transcriptome/mungbean_merge.fa.cdhit.fa .<br />
mkdir ref<br />
cd ref/<br />
(download Fvesca annotation file from phytozome)<br />
unzip Fvesca_download.zip <br />
cd Fvesca/v1.1/annotation/<br />
gunzip Fvesca_226_v1.1.protein.fa.gz <br />
gunzip Fvesca_226_v1.1.transcript.fa.gz <br />
cd ../../..<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.protein.fa .<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.transcript .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Gmax*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Ptrichocarpa*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Vvinifera*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Osativa*.fa .<br />
cd ..<br />
ln -s ../repeatmask/ProtExclude/allRepeats.libnoProtFinal<br />
mkdir tmp<br />
<br />
<br />
Running<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
maker -CTL<br />
nano maker_bopts.ctl (default, check blast_type=ncbi+)<br />
nano maker_exe.ctl (change the path ncbi-blast+, RepeatMasker, exonerate, augustus)<br />
nano maker_opts.ctl (change the path genome, evidence(transcriptome, protein), repeat library, temporary directory)<br />
mpiexec -n 30 maker -fix_nucleotides maker_opts.ctl maker_bopts.ctl maker_exe.ctl >& maker_opts.ctl.log</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-08T06:55:46Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
Preparation<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
(Add PATH(/data/skyts0401/program/maker/bin) to ~/.profile)<br />
ln -s ../assembly/Vradi.pacbio.gapfilled.final.fa .<br />
mkdir ../transcriptome<br />
cd ../transcriptome/<br />
scp skyts0401@147.46.250.244:/data/KangYJ/Mungbean/Transcriptome/merge/mungbean_merge.fa.cdhit.fa .<br />
cd ../maker/<br />
ln -s ../transcriptome/mungbean_merge.fa.cdhit.fa .<br />
mkdir ref<br />
cd ref/<br />
(download Fvesca annotation file from phytozome)<br />
unzip Fvesca_download.zip <br />
cd Fvesca/v1.1/annotation/<br />
gunzip Fvesca_226_v1.1.protein.fa.gz <br />
gunzip Fvesca_226_v1.1.transcript.fa.gz <br />
cd ../../..<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.protein.fa .<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.transcript .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Gmax*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Ptrichocarpa*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Vvinifera*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Osativa*.fa .<br />
cd ..<br />
ln -s ../repeatmask/ProtExclude/allRepeats.libnoProtFinal<br />
mkdir tmp<br />
<br />
<br />
Running<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
maker -CTL<br />
nano maker_bopts.ctl (default, check blast_type=ncbi+)<br />
nano maker_exe.ctl (change the path ncbi-blast+, RepeatMasker, exonerate, augustus)<br />
nano maker_opts.ctl (change the path genome, evidence(transcriptome, protein), repeat library, temporary directory)</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-08T06:17:51Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
Preparation<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
(Add PATH(/data/skyts0401/program/maker/bin) to ~/.profile)<br />
ln -s ../assembly/Vradi.pacbio.gapfilled.final.fa .<br />
mkdir ../transcriptome<br />
cd ../transcriptome/<br />
scp skyts0401@147.46.250.244:/data/KangYJ/Mungbean/Transcriptome/merge/mungbean_merge.fa.cdhit.fa .<br />
cd ../maker/<br />
ln -s ../transcriptome/mungbean_merge.fa.cdhit.fa .<br />
mkdir ref<br />
cd ref/<br />
(download Fvesca annotation file from phytozome)<br />
unzip Fvesca_download.zip <br />
cd Fvesca/v1.1/annotation/<br />
gunzip Fvesca_226_v1.1.protein.fa.gz <br />
gunzip Fvesca_226_v1.1.transcript.fa.gz <br />
cd ../../..<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.protein.fa .<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.transcript .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Gmax*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Ptrichocarpa*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Vvinifera*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Osativa*.fa .<br />
cd ..<br />
ln -s ../repeatmask/ProtExclude/allRepeats.libnoProtFinal<br />
<br />
<br />
Running<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
maker -CTL<br />
nano maker_bopts.ctl (default, check blast_type=ncbi+)<br />
nano maker_exe.ctl (change the path ncbi-blast+, RepeatMasker, exonerate, augustus)<br />
nano maker_opts.ctl (change the path genome, evidence(transcriptome, protein), repeat library)</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-08T06:16:36Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
Preparation<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
(Add PATH(/data/skyts0401/program/maker/bin) to ~/.profile)<br />
ln -s ../assembly/Vradi.pacbio.gapfilled.final.fa .<br />
mkdir ../transcriptome<br />
cd ../transcriptome/<br />
scp skyts0401@147.46.250.244:/data/KangYJ/Mungbean/Transcriptome/merge/mungbean_merge.fa.cdhit.fa .<br />
cd ../maker/<br />
ln -s ../transcriptome/mungbean_merge.fa.cdhit.fa .<br />
mkdir ref<br />
cd ref/<br />
(download Fvesca annotation file from phytozome)<br />
unzip Fvesca_download.zip <br />
cd Fvesca/v1.1/annotation/<br />
gunzip Fvesca_226_v1.1.protein.fa.gz <br />
gunzip Fvesca_226_v1.1.transcript.fa.gz <br />
cd ../../..<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.protein.fa .<br />
cp Fvesca/v1.1/annotation/Fvesca_226_v1.1.transcript .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Athaliana_167_TAIR10*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Gmax*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Ptrichocarpa*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Vvinifera*.fa .<br />
scp skyts0401@147.46.250.244:/alima9002/ref/forJat/Osativa*.fa .<br />
cd ..<br />
ln -s ../repeatmask/ProtExclude/allRepeats.libnoProtFinal<br />
<br />
<br />
Running<br />
(63:/data/skyts0401/Mungbean/maker/)<br />
maker -CTL<br />
nano maker_bopts.ctl (default)<br />
nano maker_exe.ctl (change the path ncbi-blast+, RepeatMasker, exonerate, augustus)<br />
nano maker_opts.ctl (change the path genome, evidence(transcriptome, protein), repeat library)</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-08T06:06:36Z
<p>Skyts0401: </p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<big>'''Install'''</big><br />
<br />
ncbi-blast+<br />
(63:/data/skyts0401/program/)<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
tar -xvzf ncbi-blast-2.6.0+-x64-linux.tar.gz<br />
<br />
<br />
exonerate<br />
(63:/data/skyts0401/program/)<br />
git clone https://github.com/nathanweeks/exonerate.git<br />
cd exonerate/<br />
git checkout v2.4.0<br />
autoreconf -i<br />
./configure<br />
make<br />
sudo make install<br />
<br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-06-08T05:49:13Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/, bin/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
export AUGUSTUS_CONFIG_PATH="/data/skyts0401/program/augustus-3.2.3/config/"<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<big>'''Install'''</big><br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-29T05:28:40Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
Maker<br />
<br />
<big>'''Install'''</big><br />
<br />
Maker<br />
(63:/data/skyts0401/program/)<br />
download from http://www.yandell-lab.org/software/maker.html<br />
cd maker/<br />
cd src/<br />
nano ~/.profile (add $PATH=RepeatMasker)<br />
source ~/.profile<br />
perl Build.PL<br />
./Build install</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-29T05:10:36Z
<p>Skyts0401: /* 5/29 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/26 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-29T05:10:00Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/))<br />
<br />
<br />
<big>'''Running'''</big><br />
<br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-29T05:09:29Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib<br />
<br />
== 5/29 ==<br />
=== Mungbean pacbio assembly ===<br />
For assessment of assembly, run CEGMA and BUSCO<br />
<br />
<br />
<big>'''Install'''</big><br />
CEGMA<br />
(63:/data/skyts0401/program/)<br />
sudo apt-get install wise (dependency)<br />
wget ftp://genome.crg.es/pub/software/geneid/geneid_v1.4.4.Jan_13_2011.tar.gz (dependency)<br />
tar -xvzf geneid_v1.4.4.Jan_13_2011.tar.gz<br />
cd geneid<br />
make<br />
make install<br />
nano ~/.profile (add $PATH:/data/skyts0401/program/geneid/bin)<br />
cd ..<br />
git clone https://github.com/KorfLab/CEGMA_v2.git<br />
cd CEGMA_v2/<br />
make<br />
<br />
<br />
BUSCO<br />
(63:/data/skyts0401/program/)<br />
wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus-3.2.3.tar.gz (dependency)<br />
tar -xvzf augustus-3.2.3.tar.gz <br />
cd augustus-3.2.3/<br />
make (dependency error)<br />
sudo apt-get install bamtools libbamtools-dev<br />
make<br />
sudo make install<br />
cd ..<br />
git clone https://gitlab.com/ezlab/busco.git<br />
cd busco<br />
sudo python setup.py install<br />
cp config/config.ini.default config/config.ini<br />
nano config.ini (change the august path (path = /data/skyts0401/program/augustus-3.2.3/scripts/))<br />
<br />
<br />
<br />
<big>'''Running'''</big><br />
CEGMA<br />
(63:/data/skyts0401/Mungbean/cegma/)<br />
export CEGMA="/data/skyts0401/program/CEGMA_v2"<br />
export PERL5LIB="$PERL5LIB:$CEGMA/lib"<br />
source ~/.profile <br />
/data/skyts0401/program/CEGMA_v2/bin/cegma --genome standard_output.gapfilled.final.fa -threads 5<br />
<br />
BUSCO<br />
(63:/data/skyts0401/Mungbean/busco/)<br />
wget http://busco.ezlab.org/datasets/eukaryota_odb9.tar.gz (dataset)<br />
wget http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz (dataset)<br />
ln -s ../assembly/standard_output.gapfilled.final.fa .<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_busco -c 20 -l eukaryota_odb9/ -m geno<br />
python /data/skyts0401/program/busco/scripts/run_BUSCO.py -i standard_output.gapfilled.final.fa -o Mungbean_plant_busco -c 20 -l embryophyta_odb9/ -m geno</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-29T04:51:10Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib<br />
<br />
Collecting repetitive sequences<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
nano fasta_devide.py<br />
python fasta_devide.py standard_output.gapfilled.final.fa<br />
nano repeatmask_combine.sh<br />
chmod a+x repeatmask_combine.sh <br />
./repeatmask_combine.sh <br />
cat standard_output.gapfilled.final_devide*.fa.masked > standard_output.gapfilled.final.fa.masked<br />
perl ~/bin/CRL_Scripts1.0/rmaskedpart.pl standard_output.gapfilled.final.fa.masked 50 > umseqfile<br />
/data/skyts0401/program/RepeatModeler-open-1.0.9/BuildDatabase -name umseqfildeb -engine ncbi umseqfile <br />
nohup /data/skyts0401/program/RepeatModeler-open-1.0.9/RepeatModeler -database umseqfiledb >& umseqfile.out<br />
perl ~/bin/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile consensi.fa.classified --unknowns repeatmodeler_unknowns.fasta --identities repeatmodeler_identities.fasta<br />
makeblastdb -in ~/bin/Tpases020812 -dbtype prot<br />
blastx -query repeatmodeler_unknowns.fasta -db ~/bin/Tpases020812 -evalue 1e-10 -num_descriptions 10 -out modelerunknown_blast_result.txt<br />
~/bin/CRL_Scripts1.0/transposon_blast_parse.pl --blastx modelerunknown_blast_result.txt --modelerunknown repeatmodeler_unknowns.fasta<br />
mv unknown_elements.txt ModelerUnknown.lib<br />
cat identified_elements.txt repeatmodeler_identities.fasta > ModelerID.lib<br />
<br />
Exclusion of gene fragments<br />
makeblastdb -in ~/bin/alluniRefprexp070416 -dbtype prot<br />
blastx -query ModelerUnknown.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out ModelerUnknown.lib_blast_result.txt<br />
cd LTR/<br />
python headerforamt.py allLTR.lib > allLTR.lib.reformed (LTR library has '(' symbol, resulting in ProtExcluder error, so change the format)<br />
cd ..<br />
mkdir ProtExclude<br />
cd ProtExclude/<br />
cp ../MITE/MITE.lib .<br />
cp ../LTR/allLTR.lib.reformed .<br />
cp ../ModelerID.lib .<br />
cp ../ModelerUnknown.lib .<br />
cat allLTR.lib.reformed MITE.lib ModelerID.lib > KnownRepeats.lib<br />
cat KnownRepeats.lib ModelerUnknown.lib > allRepeats.lib<br />
blastx -query allRepeats.lib -db ~/bin/alluniRefprexp070416 -evalue 1e-10 -num_descriptions 10 -out allRepeats.lib_blast_results.txt<br />
/data/skyts0401/program/ProtExcluder1.2/ProtExcluder.pl allRepeats.lib_blast_results.txt allRepeats.lib</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-16T06:38:44Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
(63:/data/skyts0401/program/)<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/binaries/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-16T03:37:56Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64/ -p /data/skyts0401/program/ProtExcluder1.2/<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-16T02:49:08Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812.gz<br />
gunzip Tpases020812.gz<br />
<br />
plant protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/alluniRefprexp070416.gz<br />
gunzip alluniRefprexp070416.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
hmmer - for ProtExcluder<br />
wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2-linux-intel-x86_64.tar.gz<br />
tar -xvzf hmmer-3.1b2-linux-intel-x86_64.tar.gz <br />
cd hmmer-3.1b2-linux-intel-x86_64/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
ProtExcluder<br />
wget http://www.hrt.msu.edu/uploads/535/78637/ProtExcluder1.2.tar.gz<br />
tar -xvzf ProtExcluder1.2.tar.gz <br />
cd ProtExcluder1.2/<br />
./Installer.pl -m /data/skyts0401/program/hmmer-3.1b2-linux-intel-x86_64 -p .<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-10T02:12:44Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
DNA transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-04T07:58:44Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
DNA transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.9.tar.gz<br />
tar -xvzf RepeatModeler-open-1.0.9.tar.gz <br />
cd RepeatModeler-open-1.0.9/<br />
perl ./configure<br />
configure directory of RECON, RepeatScout, nseg, trf, rmblast<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-04T07:47:05Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
DNA transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/Mungbean/repeatmask/LTR/99/)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 0ministltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/\<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_99.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result99 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR99.lib --pcoverage 90 --pidentity 80<br />
<br />
relatively old LTR (Same command with above one, but for relatively old LTR)<br />
(/data/skyts0401/Mungbean/repeatmask/LTR/85)<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
ln -s /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa .<br />
gt suffixerator -db standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 -gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 --sequencefile standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files/<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files/<br />
cd fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory . --step2 CRL_Step2_Passed_Elements.fasta --pidentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ..<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result85 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile standard_output.gapfilled.final.fa <br />
cp lLTR_Only.lib ../lLTR_Only_85.lib<br />
cat lLTR_Only.lib ../../MITE/MITE.lib > repeats_to_mask_LTR85.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR85.fasta -nolow -dir . Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner85.out Mungbean.outinner85.masked > Mungbean.outinner85.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner85.unmasked 50 > Mungbean.outinner85.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner85.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner85.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner85.clean_blastx.out.txt --outinner Mungbean.outinner85<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 CRL_Step3_Passed_Elements.fasta --resultfile Mungbean.result85 --innerfile passed_outinner_sequence.fasta --sequencefile standard_output.gapfilled.final.fa <br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step5.pl --LTR_blast lLTRs_Seq_For_BLAST.fasta.out --inner_blast Inner_Seq_For_BLAST.fasta.out --step3 CRL_Step3_Passed_Elements.fasta --final LTR85.lib --pcoverage 90 --pidentity 8<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib ../99/LTR99.lib -dir . LTR85.lib<br />
perl ~/bin/CRL_Scripts1.0/remove_masked_sequence.pl --masked_elements LTR85.lib.masked --outfile FinalLTR85.lib<br />
cd ..<br />
cat 99/LTR99.lib 85/FinalLTR85.lib > allLTR.lib</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-02T05:50:29Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
cp RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz RepeatMasker/.<br />
cd RepeatMasker/<br />
tar -xvzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
(Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/)<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
DNA transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/program/genometools-1.5.9/bin/)<br />
./gt suffixerator -db /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
./gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 &<br />
-maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
./gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
./gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 &<br />
--sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mv Repeat_*.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv CRL_Step2_Passed_Elements.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv Mungbean_* /data/skyts0401/Mungbean/repeatmask/.<br />
mv CRL_Step1_Passed_Elements.txt /data/skyts0401/Mungbean/repeatmask/.<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/.<br />
cd /data/skyts0401/Mungbean/repeatmask/LTR/fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory /data/skyts0401/Mungbean/repeatmask/LTR/fasta_files --step2 CRL_Step2_Passed_Elements.fasta --pridentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ../<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa<br />
cp lLTR_Only.lib ../<br />
<br />
<br />
Identify elements with nested insertions<br />
cd /data/skyts0401/Mungbean/repeatmask/<br />
cat lLTR_Only.lib MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . LTR/Mungbean.outinner99 <br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner LTR/Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 LTR/CRL_Step3_Passed_Elements.fasta --resultfile LTR/Mungbean.result99 &<br />
--innerfile passed_outinner_sequence.fasta --sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa<br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
<br />
relatively old LTR<br />
cd /data/skyts0401/Mungbean/repeatmask/LTR/85<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
(we already did suffixerator, so skip suffixerator)<br />
/data/skyts0401/program/genometools-1.5.9/bin/gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 &<br />
-gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85<br />
cp ../99/CRL_Step1_Passed_Elements.txt .<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out85 --resultfile Mungbean.result85 & <br />
--sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mkdir fasta_files<br />
mv Repeat_*.fasta fasta_files<br />
mv CRL_Step2_Passed_Elements.fasta fasta_files<br />
cd fasta_files</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T09:28:13Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
DNA transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
<br />
RECON - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz<br />
tar -xvzf RECON-1.08.tar.gz<br />
cd RECON-1.08/src/<br />
make<br />
make install<br />
cd ../scripts/<br />
nano recon.pl (added /data/skyts0401/program/RECON-1.08/bin to PATH = "" (third line))<br />
<br />
RepeatScout - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz<br />
tar -xvzf RepeatScout-1.0.5.tar.gz<br />
cd RepeatScout-1/<br />
make<br />
sudo make install<br />
<br />
nseg - for RepeatModeler<br />
(63:/data/skyts0401/program/)<br />
mkdir nseg<br />
cd nseg<br />
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/* .<br />
make<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/program/genometools-1.5.9/bin/)<br />
./gt suffixerator -db /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
./gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 &<br />
-maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
./gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
./gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 &<br />
--sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mv Repeat_*.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv CRL_Step2_Passed_Elements.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv Mungbean_* /data/skyts0401/Mungbean/repeatmask/.<br />
mv CRL_Step1_Passed_Elements.txt /data/skyts0401/Mungbean/repeatmask/.<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/.<br />
cd /data/skyts0401/Mungbean/repeatmask/LTR/fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory /data/skyts0401/Mungbean/repeatmask/LTR/fasta_files --step2 CRL_Step2_Passed_Elements.fasta --pridentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ../<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa<br />
cp lLTR_Only.lib ../<br />
<br />
<br />
Identify elements with nested insertions<br />
cd /data/skyts0401/Mungbean/repeatmask/<br />
cat lLTR_Only.lib MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . LTR/Mungbean.outinner99 <br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner LTR/Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 LTR/CRL_Step3_Passed_Elements.fasta --resultfile LTR/Mungbean.result99 &<br />
--innerfile passed_outinner_sequence.fasta --sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa<br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
<br />
relatively old LTR<br />
cd /data/skyts0401/Mungbean/repeatmask/LTR/85<br />
(to avoid confuse LTR_99 with this results, make directory 99 and 85 in LTR directory)<br />
(we already did suffixerator, so skip suffixerator)<br />
/data/skyts0401/program/genometools-1.5.9/bin/gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 &<br />
-gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T08:32:23Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
DNA transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > ../MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/program/genometools-1.5.9/bin/)<br />
./gt suffixerator -db /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
./gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 &<br />
-maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
./gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
./gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 &<br />
--sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mv Repeat_*.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv CRL_Step2_Passed_Elements.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv Mungbean_* /data/skyts0401/Mungbean/repeatmask/.<br />
mv CRL_Step1_Passed_Elements.txt /data/skyts0401/Mungbean/repeatmask/.<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/.<br />
cd /data/skyts0401/Mungbean/repeatmask/LTR/fasta_files/<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step3.pl --directory /data/skyts0401/Mungbean/repeatmask/LTR/fasta_files --step2 CRL_Step2_Passed_Elements.fasta --pridentity 60 --seq_c 25<br />
mv CRL_Step3_Passed_Elements.fasta ../<br />
cd ..<br />
perl ~/bin/CRL_Scripts1.0/ltr_library.pl --resultfile Mungbean.result99 --step3 CRL_Step3_Passed_Elements.fasta --sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa<br />
cp lLTR_Only.lib ../<br />
<br />
<br />
Identify elements with nested insertions<br />
cd /data/skyts0401/Mungbean/repeatmask/<br />
cat lLTR_Only.lib MITE.lib > repeats_to_mask_LTR99.fasta<br />
/data/skyts0401/program/RepeatMasker/RepeatMasker -lib repeats_to_mask_LTR99.fasta -nolow -dir . LTR/Mungbean.outinner99 <br />
perl ~/bin/CRL_Scripts1.0/cleanRM.pl Mungbean.outinner99.out Mungbean.outinner99.masked > Mungbean.outinner99.unmasked<br />
perl ~/bin/CRL_Scripts1.0/rmshortinner.pl Mungbean.outinner99.unmasked 50 > Mungbean.outinner99.clean<br />
makeblastdb -in ~/bin/Tpases020812DNA -dbtype prot<br />
blastx -query Mungbean.outinner99.clean -db ~/bin/Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Mungbean.outinner99.clean_blastx.out.txt<br />
perl ~/bin/CRL_Scripts1.0/outinner_blastx_parse.pl --blastx Mungbean.outinner99.clean_blastx.out.txt --outinner LTR/Mungbean.outinner99<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step4.pl --step3 LTR/CRL_Step3_Passed_Elements.fasta --resultfile LTR/Mungbean.result99 &<br />
--innerfile passed_outinner_sequence.fasta --sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa<br />
makeblastdb -in lLTRs_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query lLTRs_Seq_For_BLAST.fasta -db lLTRs_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out lLTRs_Seq_For_BLAST.fasta.out<br />
makeblastdb -in Inner_Seq_For_BLAST.fasta -dbtype nucl<br />
blastn -query Inner_Seq_For_BLAST.fasta -db Inner_Seq_For_BLAST.fasta -evalue 1e-10 -num_descriptions 1000 -out Inner_Seq_For_BLAST.fasta.out<br />
<br />
relatively old LTR<br />
cd /data/skyts0401/Mungbean/repeatmask/LTR<br />
(we already did suffixerator, so skip suffixerator)<br />
/data/skyts0401/program/genometools-1.5.9/bin/gt ltrharvest -index Mungbean_LTR -out Mungbean.out85 -outinner Mungbean.outinner85 &<br />
-gff3 Mungbean.gff85 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -vic 10 > Mungbean.result85</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T07:58:24Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz<br />
<br />
DNA transposons protein database<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz<br />
gunzip Tpases020812DNA.gz<br />
<br />
<br />
<big>'''Repeat masking progress'''</big><br />
<br />
Basic command which I used is based on http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced<br />
<br />
<br />
Move Mungbean genome assembly final version<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/gapfilled_assembly_final/standard_output.gapfilled.final.fa .<br />
<br />
MITE library<br />
(63:/data/skyts0401/program/MITE_Hunter/)<br />
perl MITE_Hunter_manager.pl -i /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -g Mungbean -c 10 -S 12345678<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/MITE/.<br />
cd /data/skyts0401/Mungbean/repeatmask/MITE/<br />
cat Mungbean_Step8_*.fa > MITE.lib<br />
<br />
LTR library<br />
(63:/data/skyts0401/program/genometools-1.5.9/bin/)<br />
./gt suffixerator -db /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa -indexname Mungbean_LTR -tis -suf -lcp -des -ssp -dna<br />
./gt ltrharvest -index Mungbean_LTR -out Mungbean.out99 -outinner Mungbean.outinner99 -gff3 Mungbean.gff99 -minlenltr 100 -maxlenltr 6000 -mindistltr 1500 &<br />
-maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > Mungbean.result99<br />
./gt gff3 -sort Mungbean.gff99 > Mungbean.gff99.sort<br />
./gt ltrdigest -trnas ~/bin/eukaryotic-tRNAs.fa Mungbean.gff99.sort Mungbean_LTR > Mungbean.gff99.dgt<br />
perl ~/bin/CRL_Scripts1.0/CRL_Step1.pl --gff Mungbean.gff99.dgt <br />
perl ~/bin/CRL_Scripts1.0/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile Mungbean.out99 --resultfile Mungbean.result99 &<br />
--sequencefile /data/skyts0401/Mungbean/assembly/standard_output.gapfilled.final.fa --removed_repeats CRL_Step2_Passed_Elements.fasta<br />
mv Repeat_*.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv CRL_Step2_Passed_Elements.fasta /data/skyts0401/Mungbean/repeatmask/fasta_files/.<br />
mv Mungbean_* /data/skyts0401/Mungbean/repeatmask/.<br />
mv CRL_Step1_Passed_Elements.txt /data/skyts0401/Mungbean/repeatmask/.<br />
mv Mungbean* /data/skyts0401/Mungbean/repeatmask/.</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T05:48:08Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<br />
<big>''<big>'Repeat masking program installation</big>'''</big><br />
<br />
Before running RepeatMasker, please install Repbase, rmblast, trf(Tandem Repeat Finder)<br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T05:47:43Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
Before running RepeatMasker, please install Repbase, rmblast, trf(Tandem Repeat Finder)<br />
<br />
<br />
Repbase - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast - for RepeatMasker (ver 2.6.0 has problem with install, so I installed v. 2.2.28)<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf - for RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast<br />
<br />
muscle - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://www.drive5.com/muscle/<br />
wget http://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz<br />
tar -xvzf muscle3.8.31_i86linux64.tar.gz<br />
mkdir muscle<br />
muscle3.8.31_i86linux64 muscle/<br />
<br />
mdust - for MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
wget ftp://occams.dfci.harvard.edu/pub/bio/tgi/software//seqclean/mdust.tar.gz<br />
tar -xvzf mdust.tar.gz<br />
<br />
MITE-Hunter<br />
(63:/data/skyts0401/program/)<br />
check version on http://target.iplantcollaborative.org/mite_hunter.html<br />
wget http://target.iplantcollaborative.org/mite_hunter/MITE%20Hunter-11-2011.zip<br />
unzip MITE\ Hunter-11-2011.zip<br />
mv MITE\ Hunter/ MITE_Hunter<br />
cd MITE_Hunter/<br />
perl MITE_Hunter_Installer.pl -d /data/skyts0401/program/MITE\ Hunter -f formatdb -b blastall -m /data/skyts0401/program/mdsut -M /data/skyts0401/program/muscle<br />
<br />
GenomeTools<br />
(63:/data/skyts0401/program/)<br />
check version on http://genometools.org/<br />
wget http://genometools.org/pub/genometools-1.5.9.tar.gz<br />
tar -xvzf genometools-1.5.9.tar.gz<br />
cd genometools-1.5.9/<br />
make<br />
sudo make install<br />
- if have a problem with dependency, please check this -<br />
sudo apt-get install libcairo2-dev<br />
sudo apt-get install libpango1.0-dev<br />
<br />
Genome tRNA database<br />
(63:/home/skyts0401/bin/)<br />
check version on http://gtrnadb.ucsc.edu<br />
wget http://gtrnadb2009.ucsc.edu/download/tRNAs/eukaryotic-tRNAs.fa.gz<br />
gunzip eukaryotic-tRNAs.fa.gz<br />
<br />
CRL scripts<br />
(63:/home/skyts0401/bin/)<br />
wget http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz<br />
tar -xvzf CRL_Scripts1.0.tar.gz</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T04:30:29Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
Before running RepeatMasker, please install Repbase, rmblast, trf(Tandem Repeat Finder)<br />
<br />
<br />
Repbase<br />
(63:/data/skyts0401/program/)<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast<br />
(63:/data/skyts0401/program/)<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
ver 2.6.0 has problem with install, so I installed v 2.2.28<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf<br />
(63:/data/skyts0401/program/)<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf<br />
<br />
RepeatMasker<br />
(63:/data/skyts0401/program/)<br />
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-7.tar.gz<br />
tar -xzf RepeatMasker-open-4-0-7.tar.gz<br />
cd RepeatMasker/<br />
(move the Repbase library to RepeatMasker/Libraries/)<br />
perl ./configure<br />
configure directory of trf, rmblast</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T04:00:48Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
<br />
Before running RepeatMasker, please install Repbase, rmblast, trf(Tandem Repeat Finder)<br />
<br />
<br />
Repbase<br />
63:/data/skyts0401/program/<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast<br />
63:/data/skyts0401/program/<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
ver 2.6.0 has problem with install, so I installed v 2.2.28<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf<br />
63:/data/skyts0401/program/<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T04:00:13Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
Before running RepeatMasker, please install Repbase, rmblast, trf(Tandem Repeat Finder)<br />
<br />
Repbase<br />
63:/data/skyts0401/program/<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast<br />
63:/data/skyts0401/program/<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
ver 2.6.0 has problem with install, so I installed v 2.2.28<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf<br />
63:/data/skyts0401/program/<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T03:59:57Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<big>'''Repeat masking program installation'''</big><br />
Before running RepeatMasker, please install Repbase, rmblast, trf(Tandem Repeat Finder)<br />
Repbase<br />
63:/data/skyts0401/program/<br />
should register http://www.girinst.org/<br />
download RepBaseRepeatMaskerEdition<br />
tar -xzf RepBaseRepeatMaskerEdition-20170127\ \(1\).tar.gz<br />
Libraries/ diretory will be created and all file will be copied to RepeatMasker/Libraries/<br />
cp Libraries/* RepeatMasker/Libraries/.<br />
<br />
rmblast<br />
63:/data/skyts0401/program/<br />
download from http://www.repeatmasker.org/RMBlast.html<br />
ver 2.6.0 has problem with install, so I installed v 2.2.28<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-x64-linux.tar.gz<br />
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ncbi-blast-2.2.28+-x64-linux.tar.gz<br />
tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz <br />
tar zxvf ncbi-rmblastn-2.2.28-x64-linux.tar.gz <br />
cp -R ncbi-rmblastn-2.2.28/* ncbi-blast-2.2.28+/<br />
rm -rf ncbi-rmblastn-2.2.28<br />
mv ncbi-blast-2.2.28+ rmblast-2.2.28<br />
<br />
trf<br />
63:/data/skyts0401/program/<br />
download from http://tandem.bu.edu/trf/trf.html<br />
chmod a+x trf409.linux64<br />
ln -s trf409.linux64 RepeatMasker/trf</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T02:09:40Z
<p>Skyts0401: /* Mungbean pacbio assembly */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
<br />
<big>Repeat masking program installation</big></div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T02:09:07Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
# Repeat masking program installation</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T02:08:46Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites: [http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/]<br />
# Repeat masking program installation</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T02:06:50Z
<p>Skyts0401: /* 5/1 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites {{각주}} http://weatherby.genetics.utah.edu<br />
/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
# Repeat masking program installation</div>
Skyts0401
http://plantgenomics.snu.ac.kr/mediawiki-1.21.3/index.php/2017_Haneul_Lab_note
2017 Haneul Lab note
2017-05-01T02:06:17Z
<p>Skyts0401: /* 4/26 */</p>
<hr />
<div>== 1 / 9 ==<br />
=== Minyoung_UV_QTL ===<br />
parsing genotype data(Joinmap) and phenotype data to ICImapping format(bip format)<br />
using lgcombine.py(63:/data/skyts0401/Mungbean/MY_UV/)<br />
find linkage group 2 map is wrong, construct map newly<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
make loc file(244:/home/skyts0401/reseq/chr/Mungbean_chr_coseq_parse_seg_dist.loc)<br />
missing > 10%, hetero > 10, depth < 3 marker is filtered<br />
while grouping them, find vr03, vr04 is combined in a group and vr05 is splited 2 groups, check it.<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
moving SAM data(align reseq data on pacbio-scaffold) from NICEM server to 244 server (244:/kev8305/SK3/)<br />
<br />
== 1 / 10 ==<br />
=== Minyoung_UV_QTL ===<br />
QTL analysis by using IciMapping<br />
<br />
<br />
=== Mungbean synchronous QTL ===<br />
construct genetic map (JoinMap 4.1), just using chr 3, 4 combined and chr 5 splited linkage group.<br />
<br />
ML method, Haldane algorithm<br />
<br />
<br />
=== Mungbean pacbio assembly ===<br />
convert SAM format to BAM format (244:/kev8305/SK3/)<br />
./convertbam.sh<br />
<br />
== 1/ 11 ==<br />
=== Mungbean synchronous QTL ===<br />
QTL analysis by using RQTL(desktop:/Users/sky/desktop/Mungbean_syn_RQTL.csv)<br />
just for checking locus<br />
<br />
== 1/16 ==<br />
=== Mungbean pacbio assembly ===<br />
coping sorted.bam file from 244 server to 63 server<br />
<br />
variant calling (244:/kev8305/SK3/, 63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants.vcf<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -b bam_list | bcftools call -v -m -O v > variants_snp.vcf<br />
<br />
== 1/18 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
bwa index falcon_500_sspace.final.scaffolds.fasta<br />
bwa mem -t 10 falcon_500_sspace.final.scaffolds.fasta KJ-C_1.fastq.gz KJ-C_2.fastq.gz > KJ-pe_falcon_scaffold.sam<br />
<br />
== 1/19 ==<br />
=== Mungbean pacbio assembly ===<br />
variant calling Kyoungki Jarae #5 with pacbio falcon scaffold (63:/data/skyts0401/Mungbean/mapping/resequencing/)<br />
samtools view -Sb KJ-pe_falcon_scaffold.sam > KJ-pe_falcon_scaffold.bam<br />
samtools sort KJ-pe_falcon_scaffold.bam -o KJ-pe_falcon_scaffold.sorted.bam<br />
samtools index KJ-pe_falcon_scaffold.sorted.bam<br />
samtools mpileup -f falcon_500_sspace.final.scaffolds.fasta -I -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u KJ-pe_falcon_scaffold.sorted.bam | bcftools call -v -m -O v > KJ_falcon_scaffold_variants_snp.vcf<br />
<br />
== 1/24 ==<br />
=== Jatropha assembly ===<br />
make svg file for superscaffold - linkage group marker location (63:/home/skyts0401/svg/)<br />
python make_chr_lg_svg.py standard_output.final.scaffolds.fasta.tr.JM_out.fa standard_output.final.scaffolds.fasta LG.total.txt.reformed standard_output.final.scaffolds.fasta.tr.JM_out.fa.log > chr_lg.svg<br />
<br />
== 1/31 ==<br />
=== Mungbean Chloroplast assembly ===<br />
pairing Illumina PE read (63:/home/skyts0401/)<br />
sudo python PE-pairing.py /data/jungminh/mungbean/PE/SunhwaN_1_cont.fq /data/jungminh/mungbean/PE/SunhwaN_2_cont.fq<br />
<br />
== 2/2 ==<br />
=== Mungbean Chloroplast assembly ===<br />
(63:/data/skyts0401/Mungbean/chloroplast/)<br />
gmap_build -D gmap_db -d v.radiata v.radiata.fasta<br />
gmap --nosplicing -D gmap_db -n 1 -d v.radiata -f samse scaf_cp_20k.fasta -t 12 | samtools view -Sb > Vr-cp_scaf-cp-20k.bam<br />
samtools sort Vr-cp_scaf-cp-20k.bam -o Vr-cp_scaf-cp-20k.sorted.bam<br />
samtools index Vr-cp_scaf-cp-20k.sorted.bam<br />
<br />
== 2/3 ~ 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
falcon - path : 63:/home/skyts0401/Falcon_RE/rere/<br />
<br />
before run, copy fc_env folder (63:/data/skyts0401/Falcon/)<br />
cp -r ~/FALCON_RE/rere/FALCON-integrate/fc_env YOUR_FOLDER<br />
<br />
and configure file is on /home/skyts0401/fc_run.cfg<br />
<br />
<br />
<br />
align canu contig_cp file to canu contig_cp assembly (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/bowtie2-2.2.9/bowtie2-build Vr_cp_canu.contigs.for.mapping.fasta Vr_cp_canu.contigs.for.mapping.fasta<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f canu_ctg_cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_canu_ctg_revised.sam<br />
samtools view -Sb cp-assembly_canu-ctg.sam > cp-assembly_canu-ctg.bam<br />
samtools sort cp-assembly_canu-ctg.bam -o cp-assembly_canu-ctg.sorted.bam<br />
samtools index cp-assembly_canu-ctg.sorted.bam<br />
samtools faidx canu_ctg_cp.fasta<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -f pb.cp.fasta --end-to-end --very-fast -p 20 -S cp-assembly_pb-cp.sam<br />
....<br />
~/bowtie2-2.2.9/bowtie2 -x Vr_cp_canu.contigs.for.mapping.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 20 -S cp-assembly_PE-cp.sam<br />
<br />
== 2/6 ==<br />
=== Mungbean Chloroplast assembly ===<br />
assembly (canu) mungbean pacbio corrected read for chloroplast, parameter changed (63:/data/skyts0401/Mungbean/chloroplast/)<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read5 -d assembly/cp_read5 genomeSize=154k contigFilter="5 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
~/canu/Linux-amd64/bin/canu -assemble -p cp_read10 -d assembly/cp_read10 genomeSize=154k contigFilter="10 1000 0.75 0.75 2" -pacbio-corrected pb.cp.fasta<br />
<br />
and we have 2 contigs (one contig have LSC+IR, and other contig have SSC+IR)<br />
<br />
just assembly them(cp_1.fa, cp_2.fa, cp_3.fa)<br />
<br />
== 2/9 ==<br />
=== Mungbean Chloroplast assembly ===<br />
quiver(GenomicConsensus) install(63:/data/kev8305/skyts0401/program)<br />
--- boost (ConsensusCore dependency) ---<br />
wget https://sourceforge.net/projects/boost/files/boost/1.63.0/boost_1_63_0.tar.gz<br />
tar -xf boost1_63_0.tar.gz<br />
cd boost_1_63_0/<br />
./bootstrap.sh<br />
sudo apt-get install python-dev (solution for error-pyconfig.h)<br />
sudo ./b2 install<br />
<br />
--- swig (ConsensusCore dependency) ---<br />
wget https://downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.g<br />
tar -xf swig-3.0.12.tar.gz <br />
cd swig-3.0.12/<br />
./configure <br />
make<br />
sudo make install<br />
<br />
--- ConsensusCore (GenomicConsensus dependency) ---<br />
git clone https://github.com/PacificBiosciences/ConsensusCore.git<br />
cd ConsensusCore/<br />
sudo python setup.py install<br />
<br />
--- GenomicConsensus ---<br />
git clone https://github.com/PacificBiosciences/GenomicConsensus.git<br />
sudo apt-get install libhdf5-serial-dev (solution for error-hdf5.h)<br />
sudo make<br />
<br />
<br />
Align PacBio_chloroplast read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -f pb.cp.fasta --end-to-end --very-fast -p 4 -S cp-assembly_pb-cp.sam<br />
samtools view -Sb cp-assembly_pb-cp.sam > cp-assembly_pb-cp.bam<br />
<br />
Align Illumina Paired-End read to vr.pb.cp.fasta(PacBio cp assembly) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
bowtie2 -x vr.pb.cp.fasta -1 SunhwaN_1_cont.fq.pairing.fq -2 SunhwaN_2_cont.fq.pairing.fq --end-to-end --very-fast -p 4 -S cp-assembly_PE-cp.sam<br />
samtools view -Sb cp-assembly_PE-cp.sam > cp-assembly_PE-cp.bam<br />
Polishing by Quiver<br />
<br />
== 2/10 ==<br />
=== Mungbean Chlroplast assembly ===<br />
Quiver aligning Pacbio_chlroplast read to vr.pb.cp.fasta need to use pbalign, not bowtie or some other program.<br />
<br />
pbalign install (63:/kev8305/skyts0401/program)<br />
--- blasr (pbalign dependency) ---<br />
https://github.com/PacificBiosciences/blasr/blob/master/doc/INSTALL_MAKE.md<br />
<br />
--- pbcommand (quiver dependency) ---<br />
git clone https://github.com/PacificBiosciences/pbcommand.git<br />
cd pbcommand<br />
sudo python setup.py install<br />
<br />
--- pbalign ---<br />
git clone https://github.com/PacificBiosciences/pbalign.git<br />
cd pbalign/<br />
sudo pip install .<br />
<br />
pbalign (tried to align by using blasr algorithm , but sam or bam is no longer supported in blasr, so just use bowtie algorithm) (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
pbalign --noSplitSubreads --nproc 4 --algorithm bowtie pb.cp.fasta vr.pb.cp.fasta cp-assembly-pb-cp.for.quiver.sam<br />
<br />
== 2/13 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Error occured while pbalign, so re-installed blasr(guess library error)<br />
<br />
== 2/14 ==<br />
=== Mungbean Chloroplast assembly ===<br />
variant calling with PE and PB read on chloroplast assembly genome (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_pb-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_pb_variants.vcf<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u cp-assembly_PE-cp.sorted.bam | bcftools call -v -m -O v > vr.cp_PE_variants.vcf<br />
<br />
== 2/16 ==<br />
=== Mungbean Chloroplast assembly ===<br />
Align PE reads to vr.pb.cp.fasta by using bwa (244:/kev8305/Mungbean_assembly/chloroplast/)<br />
bwa index vr.pb.cp.fasta<br />
bwa mem -t 4 vr.pb.cp.fasta SunhwaN_1_cont.fq.pairing.fq SunhwaN_2_cont.fq.pairing.fq > vr.pb.cp_PE.sam<br />
samtools view -Sb vr.pb.cp_PE.sam > vr.pb.cp_PE.bam<br />
samtools sort vr.pb.cp_PE.bam -o vr.pb.cp_PE.sorted.ba<br />
samtools index vr.pb.cp_PE.sorted.bam<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam | bcftools call -v -m -O v > variants_PE_bwa.vcf<br />
....<br />
bwa mem -t 4 vr.pb.cp.fasta pb.cp.fasta > vr.pb.cp_PB.sam<br />
samtools view -Sb vr.pb.cp_PB.sam > vr.pb.cp_PB.bam<br />
samtools sort vr.pb.cp_PB.bam -o vr.pb.cp_PB.sorted.bam<br />
samtools index vr.pb.cp_PB.sorted.bam<br />
....<br />
samtools mpileup -f vr.pb.cp.fasta -v -t DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR -u vr.pb.cp_PE.sorted.bam > variants_PE_bwa_all.vcf<br />
python vcf_filtering.py variants_PE_bwa_all.vcf > variants_PE_bwa_all_0.15.vcf<br />
<br />
== 2/21 ==<br />
=== Mungbean Chloroplast assembly ===<br />
make a code that read fasta and annotation file(gff or gb) and make a fasta file with gene CDS sequence (63:/kev8305/Mungbean_assembly/chloroplast/)<br />
python getCDS.py vr.pb.cp.fasta vr.pb.cp.gff > vr.pb.cp.gene.fasta<br />
python getCDS.py v.radiata.fasta v.radiata.gb > v.radiata.gene.fasta<br />
<br />
== 2/22 ==<br />
=== Mungbean pacbio assembly ===<br />
snp calling done, snp filtering for genetic map construction (244:/kev8305/SK3/)<br />
python ~/reseq/vcfparse_parent.py variants_snp.vcf KJ_falcon_scaffold_variants_snp.vcf<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf (dp >= 5, missing < 13, hetero < 10)<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_3.loc<br />
python locparse.py Mungbean_pacbio_scaffold_3_seg_dist.loc > Mungbean_pacbio_scaffold_3_seg_dist_format.loc (scaffold name is too long, eliminate '|')<br />
<br />
== 2/23 ==<br />
=== Mungbean pacbio assembly ===<br />
too many snp for joinmap, so filtering missing < 12<br />
python ~/reseq/vcfparse.py variants_snp_compare_parents.vcf<br />
python ~/reseq/vcfparse_coseg.py variants_snp_compare_parents_filtered.vcf Mungbean_pacbio_scaffold_4.loc<br />
python ~/reseq/cal_seg_dist.py Mungbean_pacbio_scaffold_4.loc <br />
9110<br />
python locparse.py Mungbean_pacbio_scaffold_4_seg_dist.loc > Mungbean_pacbio_scaffold_4_seg_dist_format.loc<br />
<br />
== 2/27 ==<br />
=== Mungbean pacbio assembly ===<br />
Mugbean_pacbio_scaffold_7_seg_dist_foramt.loc : no hetero, missing < 18<br />
<br />
<br />
ALLMAPS install (244:/kev8305/skyts0401/program)<br />
easy_install biopython numpy deap networkx matplotlib jcvi<br />
wget https://dl.dropboxusercontent.com/u/15937715/Data/ALLMAPS/ALLMAPS-install.sh<br />
sh ALLMAPS-install.sh<br />
and, add directory include ALLMAPS binnary code(concorde,faSize,liftOver) to $PATH in ~/.profile<br />
<br />
<br />
ALLMAPS (244:/kev8305/SK3/anchoring)<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_5_joinmap.result > Mungbean_pacbio_5_joinmap.for.allmaps<br />
python ~/reseq/allmaps_format.py Mungbean_pacbio_7_joinmap.result > Mungbean_pacbio_7_joinmap.for.allmaps<br />
python -m jcvi.assembly.allmaps merge Mungbean_pacbio_5_joinmap.for.allmaps Mungbean_pacbio_7_joinmap.for.allmaps -o JM-2.bed<br />
python -m jcvi.assembly.allmaps path JM-2.bed falcon_500_sspace.final.scaffolds.fasta.header.fasta<br />
<br />
== 3/2 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer install, for dot plot between pacbio and previous ref<br />
wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz<br />
tar -xvf MUMmer3.23.tar.gz<br />
cd MUMmer3.23<br />
make check<br />
make install<br />
MUMmer3.23/mummer -mum -b -c Vradi.ver6.cor.fa.chr.fa JM-2.chr.fasta > ref_qry.mums<br />
<br />
== 3/3 ==<br />
=== Mungbean pacbio assembly ===<br />
lastz install (244:/kev8305/skyts0401/program/)<br />
download from http://www.bx.psu.edu/~rsharris/lastz/<br />
tar -xvzf lastz-1.02.00.tar.gz<br />
cd lastz-distrib-1.02.00/src/<br />
----------------------------------<br />
problem with Makefile, so delete -Werror in line 31 of Makefile, save.<br />
----------------------------------<br />
make<br />
make install<br />
add path /home/skyts0401/lastz-distrib/bin in .profile<br />
<br />
<br />
lastz (244:/kev8305/SK3/anchoring/)<br />
lastz JM-2.chr.fasta[multiple] Vradi.ver6.cor.fa --notransition --step=20 --gfextend --chain --gapped --format=sam > old_new.sam<br />
<br />
== 3/7 ==<br />
=== Mungbean pacbio assembly ===<br />
MUMmer, having a problem with memory, was re-installed with a memory configuration <br />
make clean<br />
make CPPFLAGS="-O3 -DSIXTYFOURBITS"<br />
make install<br />
<br />
<br />
and use nucmer to align pacbio assembly and previous reference<br />
MUMmer3.23/nucmer -maxmatch -c 100 -p ref_qry JM-2.chr.fasta Vradi.ver6.cor.fa<br />
MUMmer3.23/nucmer --noextend -c 100 -p ref_qry_noextend JM-2.chr.fasta Vradi.ver6.cor.fa<br />
<br />
<br />
and draw a dot plot using mummerplot<br />
mummerplot --fat -l -png ref_qry_noextend.delta<br />
but it occurs a error like<br />
set mouse clipboardformat "[%.0f, %.0f]"<br />
^<br />
"out.gp", line 2594: wrong option<br />
It seems gnuplot was updated, so doesn't support that option resulted from mummerplot. just edit out.gp to delete that line.<br />
<br />
/kev8305/skyts0401/program/last-842/scripts/last-dotplot -2 'Vr*' -2 'scaffold_?' -x 1920 -y 1920 ref_qry.maf plot.png<br />
<br />
== 3/16 ~ ==<br />
=== Mungbean pacbio assembly ===<br />
compare between pacbio assembly and previous reference<br />
<br />
1. 50 reseq marker/LG on previous reference mapping on pacbio super scaffold for checking same marker is on same chromosome. (244:/kev8305/SK3/anchoring/check)<br />
python SNP_marker_pos.py Vradi_ver6.fa Mungbean_chr_coseg_parse_seg_dist.loc > Vradi.ver6.reseq.marker.fasta<br />
makeblastdb -in JM-2.chr.fasta -dbtype 'nucl' -out Mungbean_pacbio<br />
blastn -db Mungbean_pacbio -query Vradi.ver6.reseq.marker.fasta -outfmt 6 -out reseq_marker.blast -num_threads 2 -evalue 1e-5 -word_size 100<br />
python blastparse.py reseq_marker.blast > reseq_marker_for_svg.result<br />
python chr_compare_svg.py fasta.size reseq_marker_for_svg.result > chr_compare_3.svg (output can be changed based on option in python code)<br />
<br />
/data2/skyts0401/program/circos-0.69-4/bin/circos -conf chr_compare.conf (193:/data2/skyts0401/check/circos)<br />
<br />
2. contig compare. (63:/data/skyts0401/Mungbean/assembly/)<br />
scp assembly@147.46.250.181:/home/assembly/data/Mungbean/mapping/p_ctg.longest.fa .<br />
scp skyts0401@147.46.250.244:/kev8305/SK3/anchoring/final.contigs.longest100.fa .<br />
gmap_build -d pacbio_contig_new p_ctg.longest.fa -D ./<br />
gmap -d pacbio_contig_new -D pacbio_contig_new/ final.contigs.longest100.fa -t 12 -f 1 > pacbio_contig_compare.psl<br />
--------------------------------------------------------------------------------------<br />
(NICEM:/home/assembly/check/)<br />
../bwa-0.7.15/bwa mem -t 30 p_ctg.longest.longest3.fa SunhwaN_1.fastq.gz SunhwaN_2.fastq.gz > newcontig_illumina.sam<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
ln -s /NGS/NGS/VignaRadiata/DNA/Sunhwa_pacbio/filtered_subreads.fasta .<br />
bwa index p_ctg.longest.longest1.fa<br />
bwa mem -t 8 p_ctg.longest.longest1.fa filtered_subreads.fasta > newcontig_pacbio.sam<br />
<br />
samtools view -Sb newcontig_pacbio.sam > newcontig_pacbio.bam<br />
samtools sort newcontig_pacbio.bam -o newcontig_pacbio.sorted.bam<br />
samtools index newcontig_pacbio.sorted.bam<br />
~ same samtools command with newcontig_illumina.sam ~<br />
<br />
Find that something looked splited mapping, so re-align with end-to-end method of bowtie2<br />
(NICEM:~/check/)<br />
~/bowtie2-2.2.9/bowtie2-build p_ctg.longest.longest1.fa p_ctg.longest.longest1.fa<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -1 SunhwaN_1.fastq.gz -2 SunhwaN_2.fastq.gz --end-to-end --very-fast -p 30 -S newcontig_illumina_endtoend.sam<br />
~/bowtie2-2.2.9/bowtie2 -x p_ctg.longest.longest1.fa -f filtered_subreads.fasta --end-to-end --very-fast -p 30 -S newcontig_pacbio_endtoend.sam<br />
<br />
!!!bowtie2-2.3.0 version has a bug!!!<br />
<br />
(244:/kev8305/SK3/anchoring/check/)<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_pacbio_endtoend.sam .<br />
scp assembly@147.46.250.181:/home/assembly/check/newcontig_illumina_endtoend.sam .<br />
~ same samtools command, view, sort, index ~<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_illumina_endtoend.sorted.bam > newcontig_illumina_endtoend.mapping.depth<br />
samtools depth -a -q 0 -Q 0 -r 000000F:2000000-4000000 newcontig_pacbio_endtoend.sorted.bam > newcontig_pacbio_endtoend.mapping.depth<br />
<br />
blat for comparing contig (NICEM:/home/assembly/check/, 244:/kev8305/SK3/anchoring/check/)<br />
------------------------------<br />
(contig_compare.sh)<br />
#!/bin/bash<br />
<br />
for i in {0..19}; do<br />
../blat p_ctg.longest.longest3.fa final.contigs_devide${i}.fa contig_compare_all_${i}.psl &<br />
done<br />
<br />
wait<br />
------------------------------<br />
<br />
(NICEM)<br />
python fasta_devide.py final.contigs.reformed.fasta <br />
chmod a+x contig_compare.sh <br />
./contig_compare.sh <br />
ls contig_compare_all_*.psl > psl.list<br />
nano pslfilter.py <br />
python pslfilter.py psl.list > conitg_compare_all.result<br />
python pslfilter2.py contig_compare_all.result > contig_compare_all_filtered.result<br />
<br />
== 4/18 ==<br />
=== Jatropha assembly ===<br />
make Jatropha figure(chr - lg) for new version(allmaps) (244:/kev8305/skyts0401/Jatropha)<br />
scp skyts0401@147.46.250.63:/home/skyts0401/svg/make_chr_lg_svg.py make_chr_lg_svg_revised_for_allmaps.py<br />
python make_chr_lg_svg_revised_for_allmaps.py Jatropha_map1.result Jatropha.allmaps.agp > Jatropha_chr_lg.svg<br />
<br />
== 4/26 ==<br />
=== Mungbean pacbio assembly ===<br />
mungbean super scaffold (JM-2.fasta) was gap filled. Final assembly Fasta is in /kev8305/SK3/anchoring/gapfilled_assembly_final/<br />
<br />
== 5/1 ==<br />
=== Mungbean pacbio assembly ===<br />
Repeat masking progress is based on these sites [각주] http://weatherby.genetics.utah.edu<br />
/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced, http://www.repeatmasker.org/<br />
# Repeat masking program installation</div>
Skyts0401